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PREFACE 

The only variation from current practice in the organization 
of the subject matter of this study is the changing of the position 
of the conclusions. The writer has stated his conclusions at the 
very beginning of the study. This was done to enable the busy 
reader to see at a glance the general trend of the study. 

As the practical school administrator knows, the gathering of 
the kind of data upon which this study is based is a matter of 
some difficulty. It is the kind of data, however, which we must 
have if science is to aid in the selection of teachers. 

Some psychologists may contend that an analysis of teaching 
rather than the correlation of observable facts with varying 
amounts of success of actual teachers is the only correct method 
of determining what tests will distinguish good from poor 
teachers. No one would deny the value of sagacious insight into 
any problem of human engineering. So far neither the analysis 
method nor the correlation method has done very well in practice 
on the practical job. This study is based on the correlation 
method. Its shortcomings should not be confused either with 
the logical soundness or with the practical superiority of test 
construction on a basis of correlation between test scores and 
performance. 

Frederic B. Knight. 
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CONCLUSIONS BASED ON THIS STUDY 

This thesis deals with the problems of isolating the significant 
and measurable qualities of effective teaching and the methods of 
measuring these qualities. It is a continuation of similar studies 
of which the work of Meriam was the first. A rating of 153 
high-school and elementary-school teachers was obtained by hav- 
ing the teachers rate each other for the quality of general teaching 
ability and other traits. While it may be said that the teachers 
knew each other only in a social way and, therefore, could not 
rate each other for general teaching ability, the data show that 
an adequate rating can be prociu'ed by this method. 

The statistical treatment of the data shows that: 

1. Chance halves of the mutual ratings of the teachers correlate 
with each other + .899, =*='.01. 

2. The mutual ratings of the teachers correlate with the ratings 
of the supervisors + .962, =*= .001. 

3. The mutual ratings of the teachers correlate with pupils' 
estimates +.681, ±.05. 

4. There is also substantial agreement among the mutual rat- 
ings of teachers, when they rate for specific traits, such as intel- 
lectual strength and skill in discipline. The average correlations 
between chance halves of the ratings are respectively +.879, 
s*=.&16, and +.838, =*=.023. These correlations are evidences 
of the ability of teachers to rate each other. 

The ratpgs for general teaching ability, secured in this way, 
were used as measiu^s of teaching merit, against which objective 
facts were correlated. The correlations between general teaching 
ability and age, amoimt of experience, quality of handwriting, 
intelligence as measured by test, major academic interests, nor- 
mal-school scholarship, amoimt of professional study during active 
service, and ability to pass a professional test have been secured. 
The correlations are too low to warrant one in using these factors 
for prognostic purposes, except ability to pass a professional test 
(+.541), normal-school scholarship (+.153) and intelligence 
(+ . 108). Byusing the coefficient of partial correlation we find, in 

vui 



Conclusions Based on This Sitidy ix 

the case of elementary school teachers, that, the factors of intelli- 
gence and normal-school scholarship being constant, there is a 
mutual relationship of + . 57 between ability to teach and ability 
to pass a professional test. Professional tests may be used to 
estimate probable success in teaching. The amount of profes- 
sional study accomplished during active service is also indicative 
of success in teaching. The number of teachers who had accom- 
plished professional study of this sort was too small in the groups 
which were studied to allow an accurate determination of the 
degree of significance that pix)fessional study has. 

In the case of high-school teachers, intellectual differences, as 
determined by mental tests, appear to be significant. For the 
selection of high-school teachers the use of mental tests would be 
of value. 

These data, as a whole, may be interpreted to mean that the 
general factor of interest in one's work becomes the dominant 
factor in determining one's success in teaching. The reasoning 
which leads to this conclusion is not straightaway, for we have 
not as yet objective tests of interest. We do know, however, that 
other measurable traits, either alone or in combinations, are not 
adequate explanations of teaching success. With our present 
knowledge it is reasonable to suppose that genuine interest in 
one's work accounts for a large part of teaching success. 

In the second part of the study data are presented which show 
the spread of general estimate to particular traits, when judgments 
or ratings are made. For example, when a judge attempts to rate 
a teacher in some particular trait, his rating is a defense of his 
general estimate of that teacher, as well as a rating of the trait 
under consideration. 

The mutual judgments of teachers for the trait, intellectual 
ability, correlate with their judgments of general teaching ability 
+.935,=*= .014. 

The mutual judgments of teachers for the trait, skill in disci- 
pline, correlate with their judgments for general teaching ability 
+.789,=*= .001. 

The mutual judgments of teachers for the trait, skill indiscipline, 
correlate with their judgments for intellectual ability +.863,=^ 
.080. 

It would be difficult to hold that these correlations represent the 
true relationships which exist between these pairs of traits. The 
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presence of a large factor of spread of general estimate accounts 
best for the size of these correlations. 

A study of the correlations between the ratings of 126 teachers 
in a New York school system for 15 traits showed that 105 of the 
120 correlations studied could be accounted for by chance varia- 
tion from an average correlation, even if a perfect, or a 100 per 
cent, spread of general estimate was present. 

A study of the correlation between qualities of teaching as 
presented by Boyce in his work, published in the Fourteenth Year'- 
book of the National Society for the Study of Education^ shows that 
85 per cent of the correlations come within a range of =^.150. 
These facts can be satisfactorily explained only when a factor of 
spread of general estimate is allowed. 

It seems fair to conclude, therefore, that in judging particular 
traits general estimate influences the particular estimate to such a 
degree that judgments of particular traits are in themselves of 
little practical use. 



CHAPTER I 

INTRODUCTION 

This thesis^ lies in the field of research which is concerned 
with methods of rating teaching, of determining the significant 
factors in teaching ability, and of measuring objectively such 
factors. 

This field of research is by no means a virgin one nor is it one of 
academic interest only. Practical school administrators have no 
more important and, at present, no more troublesome problems 
than those which are grouped arbimd the technique of selecting 
and rating teachers. 

Actual isolation of such factors as intellect and temperament, 
which are indispensable to successful teaching, and the discovery 
of a method determining whether a prospective teacher possesses 
the indispensable qualities of a good teacher would be a boon to 
school administrators. 

During the past fifteen years educators and psychologists have 
given their earnest attention to this series of problems, on which a 
great deal has been written and on which much research work has 
been done. Three studies have been selected to show the general 
development of attempts which have been made to find solutions 
to different phases of the personal-management problems of our 
public schools. 

meriam's study 

Dr. L. L. Meriam, in a research study, Normal School Education 
and Efficiency in Teaching, published in 1906, Teachers College 
Contributions to Education, No. 1, Chapter IV, presented data 
which were used to discover the correlation between teaching 
efficiency and scholarship in the normal school. 

"This is the problem," said Dr. Meriam. "Is the efficient 
teacher the proficient scholar? To what extent is he so in each 
of the subjects of the normal-school course? In other words, 
does the one who stands high among fellow-teachers stand rela- 
tively high among fellow-students in the work preparatory to his 

^AU data used in thiji study are on file at Teachers College, Columbia 
Univeraity, New York City. 

1 



2 Qualities Related to Success in Teaching 

teaching? Such a study of mental relationships is in itself a study 
of causes. If it be found a rule that efficiency in teaching follows 
proficiency in scholarship, then, other things being equal, the 
latter may be considered a vital contribution to the latter. And 
this is our present purpose: to discover, so far as possible, what 
elements enter into the making of a capable teacher. Corollary 
questions are: To what extent does proficiency in scholarship mean 
efficiency in teaching? . . .*' 

In Dr. Meriam's research study an admirable attempt was made 
to find out the relative teaching ability of a large number (1,185) 
of normal-school graduates. Equally careful work was done to 
determine the relative normal-school success of these graduates. 
Meriam had no accurate measure of teaching efficiency and no 
reliable measure to equate the amount of success in one school 
system with the amount of success in another. He encountered 
the same difficulty in interpreting normal-school marks as meas- 
ures of scholastic accomplishment. Great statistical ingenuity 
was shown by Dr. Meriam and his results, by all odds, were 
the most dependable at the time of the publication of his 
thesis. 

The correlation between normal-school standing and ability or 
success in the field was found to be so surprisingly low that dif- 
ferences in scholarship among students in the normal schools 
seemed to bear a negligible relation to future differences in teach- 
ing ability. Meriam found that practice teaching during normal- 
school training was slightly prophetic of the quality of teaching 
which should be expected after graduation. Examinations con- 
taining professional subject-matter did not appear to furnish a 
significant index of an individual's ability to teach. 

The statistical difficulties of Meriam's work should not blind 
one to its value. It clearly stated the problem of correlating 
teaching ability with factors which are more or less objective and 
measurable. It developed a technique of research that was sound 
in theory. It exercised much influence in taking the problem of 
teaching efficiency from the field of opinion and discussion and in 
placing it, where it properly belongs; namely, in the field of re- 
search and objective measurement. 

Meriam's more important findings are expressed as coefficients 
of correlation between teaching efficiency and scholarship in 
normal-school studies. These he reports as follows: 
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Correlation between Teaching Ability and Practice Teaching + .39 

Correlation between Teaching Ability and Psychology + . 37 

Correlation between Teaching Ability and History and Principles of 

Education +.28 

Correlation between Teaching Ability and Method Courses + . 29 

Correlation between Teaching Ability and Academic Courses + .22 

Meriam's data also support his conclusion that, after the first 
year of teaching, experience, as such, has little if any influence 
on the improvement of teaching efficiency. 

Elliott's study 

In 1910 another treatment of this general subject was published 
which deserves notice. Dr. Edward C. Elliott presented to the 
second annual convention of city superintendents in Wisconsin 
''A Tentative Scheme for the Measurement of Teaching Effi- 
ciency." This score card has been revised in detail, but the first 
scheme included all the essential factors. Elliott stated these 
three propositions which were of more than temporary importance: 

''Is it possible to devise and to apply to the teaching process, 
impersonal, quantitative standards, whereby the relative worth 
and efficiency of teachers may be determined more justly and with 
greater precision than under the ordinary practices of the day? 

'* Does not the efiFective organization, administration, and super- 
vision of public schools require that the conditions and results of 
the teachers' work be subjected to measurements of a quantitative 
rather than a qualitative nature? 

''Is it possible for the present generation to make any reliable 
and satisfactory conclusions concerning the direction and rate of 
educational progress without standards of value resting upon a 
quantitative basis? " 

The scheme divides teaching efficiency into seven sections and 
to each section assigns a weight or value. The scheme, in sum- 
mary, is here reproduced: 

I. Physical Efficiency 12 points 

II. Moral-nature Efficiency 14 " 

III. Administrative Efficiency 10 " 

IV. Dynamic Efficiency 24 " 

V. Projected Efficiency 6 " 

VI. Achieved Efficiency 24 " 

VII. Social Efficiency 10 " 



Total 100 
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4 QuaUUes Related to Success in Teaching 

The value of this scheme is that attention is directed to par- 
ticular traits and that diagnosis of teaching merit is stimulated. 
The suggested values are, of course, matters of opinion. 
The assumption that analysis of the teacher and of the judgment 
of particular qualities, studied in isolation, can be made is highly 
questionable. 

boyce's study 

In addition to Meriam's study the only other research of ex- 
tensive nature is that made by A. C. Boyce^ and published under 
the title "Methods of Measuring Teachers' Efficiency," Part II 
of the Fourteenth Year^Book of the National Society for Study of 
Education. Boyce obtained the rating of a great many teachers 
for general merit and for specific qualities. Then, by a method of 
correlation, he worked out the relative significance of the qualities. 
There are many technical improvements in this study ovet that of 
Meriam's, but the general procedure is the same. 

For fifteen years the teaching profession has been sensitive to 
problems of recruiting new members. As yet, however, no one 
knows the exact formula for success in teaching. The complexity 
of personality and character and the many-sidedness of teaching 
have continually baffled useful analysis. We know that several 
measurable traits are n^t essential to successful teaching, but we 
do not know what traits must be present in superior instructors. 
The inspiring advance in the application of psychological methods 
to the selection of clerks, stenographers, machine operators, and 
fliers in industry together with similar success in vocational 
guidance in professional education, such as engineering and 
dentistry, increases our confidence in the hope that before long 
psychology will enable school administrators to select teachers 
with frequency and size of error far smaller than prevails at 
present. 

^ For a discussion of this study, see the last part of Chapter V. 



CHAPTER II 

METHOD AND DATA INVOLVED IN THIS STUDY 

An accurate rating of a sufficiently large number of teachers for 
general teaching ability must be obtained before any analysis of 
the significant qualities of teaching is possible. We must know 
who the good teachers are, who are the poor teachers and who are 
the fair teachers, before it is worth while to attempt to find out 
what facts are pertinent in judging their teaching skill. After we 
get a group of teachers who we know differ among themselves 
in general teaching ability, by certain amounts or units, then we 
may proceed, by a method of correlation, to find out what facts 
about them are of prognostic or diagnostic value. 

Such a rating of general teaching ability for 156 grade and 
high-school teachers who were at work in the public schools of 
Towns A, B, and C, in Massachusetts, during the school year 
1918-1919, has been obtained. There were six groups of teachers. 
Three of these groups were the grade teachers in Towns A, B, and 
C, and three were the high-school faculties in these towns. The 
number of teachers in each group follows: 

f Grade teachers 53 

1 HighHBchool teachers 15 

Grade teachers 35 

HighHSchool teachers 13 

Grade teachers 30 

\ HighHSchool teachers 10 

Three separate ratings for general ability in teaching were ob- 
tained for each group. One rating was secured from the supervi- 
sors in each system for their respective teachers. Another was 
secured by the mutual judgments of the teachers themselves of each 
group. Another was secured by a consensus of pupils' opinions. 

In general the method which was used in deriving the ratings 
was to have the several judges rate each teacher in the group rela- 
tive to the other members of the group for the broad quality 
general ability as a teacher. The theory which imderlies this 
method is this: Where direct measurement in terms of amoimt is 
impossible, measurements by relative position in a series may be 
so controlled that possibly as exact and as true ratings may be ob- 

5 
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6 Qualities Related to Success in Teax^hing 

tained as if units of amount had been used. It is assumed that 
the amount of difiFerence between two teachers who have been thus 
judged will depend on the ease with which the difiFerences are ob- 
served by competent judges. 

In using this general method of rating teachers for teaching 
ability, we have taken it for granted that the good teacher is the 
one whom competent judges rate as good. We hold throughout 
this study that the poor teacher is the one whom the judges have 
rated as poor. These hypotheses will presumably be acceptable 
to those who are familiar with the theory and practice of social 
measurements. It may be admitted that one could question the 
final truth that the opinions of any number of judges, however 
competent and harmonious they might be, necessarily establish 
the facts of teaching merit. Thus the really good teacher, it might 
be held, is the one who gets her pupils on fastest. 

To determine how much the progress of pupils is due to any one 
teacher is not possible by any method or information that is as yet 
available. Even to measure a pupil's total progress, much less 
the total progress of a class, is as yet a little venturesome. 

It might also be held that the amount of development of char- 
acter and morality in the pupils is the only test of good teaching 
and that what others think about the teacher is really irrelevant. 

To hold, on the other hand, that competent judgments of teach- 
ers, when properly combined, will give a very useful and approxi- 
mately true rating, as well as probably the best rating method 
that is now available, is only common sense. This rating method 
is entirely defensible. 

The good lawyer, after all, is the one who is considered a good 
lawyer by fellow-members of the bar. The poor dentist is the one 
to whom no other dentist would go or recommend anybody else. 
The great preacher is the one who attracts visitors. The good 
teacher is the teacher who is thought to be good. 

Where difiFerences in skill among employed people must be de- 
termined, judgments in terms of better than the average, poorer than 
one^s associates^ and similar expressions, are useful measures of 
ability. Of course, the final validity of the judgments may be les- 
sened by the presence of constant error in the opinion that it of- 
fered, or by the incompetence or paucity of the opinions expressed, 
or by the failure properly to combine the judgments after they 
are obtained. 
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Of the three ratings, — by the teachers themselves, which is 
labeled "A," by the supervisors, *' B," by the pupils " C," — we shall 
take up first the ratings of the teachers which are indicated by the 
judgments of their fellow-teachers. 

PBOCESS OF RATING TEACHEBS, BASED UPON THE TEACHEBS' 

ESTIMATES 

Step 1. Teachers' meetings for each group were called and the 
teachers were asked to rate each other for general teaching ability, 
using the relative-position method. The ratings were not in 
terms of good, fair, poor, because what one teacher might consider 
good, another teacher who was more critical might consider only 
fair. This type of difiFerence might run through the series of 
judgments. 

The ratings were not secured in terms of how much below the best 
teacher you have ever krumn, or the equivalent expressions, for er- 
rors of an obvious nature are bound to creep into any such rating 
system. The ratings were all given in terms of relative position 
within the group itself. Thus, when the grade teachers of Town A 
rated each other, every teacher placed in order of merit all the 
teachers in the Town A group. The amount of difiFerence between 
the teachers in the final rating was determined by combining all 
the judgments of the teachers. Each member of the six groups of 
teachers, while in a teachers' meeting, rated those in the group to 
which she belonged in a similar fashion and under similar condi- 
tions with the same instructions. The instructions which were 
given to the teachers follow: 

INSTRUCTIONS TO TEACHEBS 

On this sheet you are requested to give certain ratings of each teacher in the 
list, including yourself. Please rate every teacher and please be absolutely 
frank in your rating?. You need not sign your name. Nobody will ever know 
how you or anybody else rated him. No personal use will ever be made of any 
of these rating?. They will be used in a piirely scientific study to determine the 
significance of age, education, early interests, etc., etc., for success as a teacher. 
The names wiU all be cut off and destroyed as soon as the different items in the 
inquiry have been numbered to fit the ones to whom they refer. Also, do not 
feel disturbed because in each respect somebody has to be rated lowest. These 
ratings are all relative, and the lowest teacher in the group may well be of very 
great ability. Please be sure to record rating?, even if they seem to you to be 
little better than mere guesses. The opinions of twenty men give a useful rat- 
ing, even if any one of the twenty taken alone is almost worthless. 

2 
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On the sheet is a list of the teachers. Choose the teacher of greatest teaching 
ability and write 1 after his or her name in Column 1. Choose the teacher next 
below in teaching ability and write 2 after his or her name in Column 1. Write 
8 after the name of the one next below in teaching ability, and do so for 4, 5, 6, 
etc. If two or more seem absolutely equal in teaching ability give them the 
same rating.^ 

After the teachers had read the instructions carefully a few 
minutes were allowed them for asking any questions that might 
occur. When it was clear that the teachers understood what was 
wanted of them, they proceeded with the rating. No names 
were signed to the rating sheets. It was evident that honest and 
sincere opinions were expressed. The resultant ratings of each of 
the six groups of teachers were then examined. Those sheets 
which were incomplete or did not sufficiently distribute the ratings 
were discarded. This lack of usable material was not at all 
great. 

The teachers found that rating each other was a method of 
polite gossip and was evidently more or less enjoyable. For each 
set of teachers sufficient material was obtained. The spread or 
range between the poorest and the best teacher was large. In 
many cases it was as great as the number of teachers involved. 
The number of useful ratings (97) were distributed as follows: 

rj, . f Grade teachers 30 ratings 

1 HighHSchool teachers 14 

f- ^ / Grade teachers 16 

1 HighHSchool teachers 10 

f- p f Grade teachers 18 

\ HighHSchool teachers 9 

Total 97 " 

Step 2. Each set of ratings was then divided into two halves by 
chance drawings. Each half has been treated separately through- 
out this study. These halves will be referred to as Group A and 
Group B. The carrying of two groups makes corrections for at- 
tenuation in the correlations and shows also the reliability of 
the judgments. 

A transcript (see Table I) of fifteen of the ratings for general 
teaching ability of Town A grade teachers has been made. These 
ratings compose one group (Group B) of the mutual judgments 
which is treated later to get a single rating of teachers. The 

^The complete instructions are given on pp. 46-48. 
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columns include the complete ratings of fifteen different judges. 
Thus, by looking at the first column one will see that one judge 
rated Ah — as the 21st best teacher of the group, Dr — as the 6th 
best, Sm — as the best, Hi — as the 3rd, El — as the 21st, and so on. 

The numbers opposite each teacher's name show the ratings 
received. Thus, Ah — was rated by one teacher as 21st, by an- 
other as 2nd, by another as 1st, by another as 11th, by another as 
1st, and so on. In this way, we have approximately 750 ratings 
of teachers in a group in terms of relative worth. The ratings by 
fifteen other teachers of the teachers of this group were similarly 
obtained and transcribed. 

We have the judgments of every teacher on every teacher. 
Occasionally a judge failed to rate some teacher. This was due 
to the lack of acquaintance with that teacher. This kind of omis- 
sion is an index of thoughtful estimate, because it indicates the 
fact that a judge who had no opinion gave no rating rather than 
record a mere guess. The trustworthiness of these judgments will 
be discussed later. ^ 

Step S. We now have the relative ranking of each teacher in 
the opinion of fifteen judges. This is a chance half of all of the 
ratings; namely. Group B. Group A was similarly obtained. 

The next step is to combine these two groups of ratings into a 
single rating. The theory which underlies the procediure requires 
some explanation. We know several facts about the resultant 
and combined rating, even before we make it. 

First, the final relative arrangement will not be the result of any 
one individual judgment, since it will be the product of the ratings 
of fifteen judges. The bias of any single person who has served as 
a judge will not operate unduly to influence the final result. The 
fifteen sets of ratings were chosen by chance and chance errors of 
overestimation or underestimation of any teacher, because of the 
particular friendship of or dislike for any teacher, by a particular 
judge, will be offset by opposite chances. 

Second, except for a negligible chance in the drawing of the 
fifteen ratings for Group A or Group B, there will be no constant 
error, due to the fact that the judges may know some of the teach- 
ers very well and others only slightly. No one judge will know all 
the teachers equally well; but the teachers who are well known 

^Seepage 17. 
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TABLE I 

A TiuNSCBiPT OF Fifteen Ratings fob General Teaching Abujtt 
Groxtp B, Town A, Elementary Grade Teachers 





1 


Ratings by Judges 




1 
21 


2 
2 


3 
1 


4 
11 


5 
1 


6 
21 


7 
1 


8 

• • 


9 

1 


10 

1 


11 
1 


12 
2 


13 
11 


14 
2 


15 


1 


Ah— 


7 


2 


Dr— 


6 


1 


3 


1 


• • 


10 


2 


• • 


2 


• • 


2 


1 


3 


52 


2 


3 


Sm— 


1 


• • 


• • 


3 


1 


1 


2 


18 


2 


2 


• • 


1 


5 


46 


1 


4 


Hi- 


3 


15 


• • 


16 


• • 


3 


4 


• • 


• • 


• • 


2 


1 


7 


19 


3 


5 


El— 


2V 


3 


8 


25 


2 


26 


1 


2 


2 


4 


2 


2 


15 


22 


8 


6 


Co— 


23 


• • 


5 


5 


1 


8 


1 


6 


2 


3 


22 


1 


34 


32 


20 


7 


Sy- 


• • 


15 


2 


13 


1 


17 


2 


1 


2 


4 


24 


2 


21 


16 


• • 


8 


Pa— 


11 


14 


6 


17 


1 


13 


1 


4 


2 


3 


12 


5 


1 


11 


6 


9 


Sp— 


10 


14 


10 


14 


1 


18 


11 


8 


2 


2 


2 


3 


6 


15 


2 


10....... 


Le— 


28 


• • 


1 2 


1 


11 


• • 


7 


2 


4 


2 


1 


17 


17 


15 



^ It was proper to give two teachers the same rating if their ability seemed 
equal 



Better-Worse Judgments Based on Table I 
R^: 

1 is judged worse than 2, 7 times with equal or tied votes; 1 is judged worse 
than 3, 6 times with 1 equal or tied votes, and so on. 

a h c d e f g h 

120 — 7 562 — 6 

31 — 6 73 — 6 

4 1 — 4 



3 

4 
5 



3 
2 
2 



4 
3 
2 



6 



7 
8 
9 



2 
4 
2 



4 
6 
6 



4 
5 
6 



1 
1 
3 



1 
3 
3 



8 

9 

10 



2 
2 
3 



7 
5 
6 



451 — 2 893 — 6 

61 — 2 10 3 — 4 

7 3 — 2 12 1 — 5 

Columns a, h, e, f refer to teachers by number. Columns c, g, record tied 

votes. Columns d, h, record worse votes. 
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to some of the judges will be those who are not so well known to 
others. Then, too, those teachers who are little known to some 
judges will be well known to others. Thus, intimate knowledge 
and lack of knowledge on the part of those who are judging the 
teachers will be somewhat evenly spread over the whole list. 

Third, a fairly minute scale will be possible, because the ratings 
are spread about as widely as the number of teachers who have 
been judged. Many of the ratings which were presented by the 
teachers had a spread of over 40, and the number of teachers who 
had been judged was 52. 

In any scale which is based on relative position, it is well known 
that absence of personal bias, absence of constant error, and a 
large number of judges are the chief desiderata. In these ratings 
such desiderata are present. 

THE LOGIC OF THE METHOD 

We must have at our command a technique, not only of chang- 
ing measures of relative position into measures of units of amount, 
but also of combining incomplete judgments of relative positions 
into units of amoimt. Our problem, in simplified form, is to 
change differences which are noticeable to competent judges into 
differences of units of amount. A concrete application follows: 

If, in the opinion of judges, person A is better than person B, we 
must find out how much better. Suppose that, in judging A and B, 
we have ten opinions. If five of the judges think that A is better 
than B and five think that B is better than A, then we are justi- 
fied in calling the matter a draw and rule that A and B are equal. 

Suppose, however, that six think A better than B, and only four 
think B better than A, then we are justified in holding that A is 
better than B. The question now is: By how much is A better 
than B? The percentage of judges who notice a difference be- 
comes the basis of our procedure. 

It is reasonable to suppose that differences which are noticed 
equally often are equal in amount. We assume that all judgments 
are of equal value. Further, we arbitrarily define as one unit of 
difference that difference which 75 per cent of the judges notice. 

Thus, if 100 judgments are made comparing A and B and 75 
vote A to be better than B, then A is better than B by one unit. 

When the data are incomplete, the only thing to do is to com- 
pare those judgments which are complete and disregard the judg- 
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ments which are not paralleled with similar judgments of the per- 
son compared. 

As an illustration, let us turn back to the judgments of the 
Town A grade teachers (Table I) . In comparing Ah — with Dr — ^ 
we shall neglect the fifth judgment of Ah — because it is not 
paralleled with one of Dr — , but we shall not neglect it in compar- 
ing Ah — with El — , for both have ratings given by the same judge. 

We also give to the teacher who has been rated the lowest an 
arbitrary value of one unit. From this as a base we build up the 
values of the series. It is obvious that the more judgments there 
are, the better will be the final rating. The more competent the 
judges are, the more dependable the final rating will be^ 

These marks will be used as measures only in respect to their 
differences. Thus there is a difference of 10 units between a 
teacher rated 15 and one rated 25. But it would not be correct 
to think of a teacher rated 20 as twice as efficient as one rated 10 or 
half as good as one rated 40. These quantitative measures of ef- 
ficiency cannot be compared on a basis of multiplication or divi- 
sion. We are interested in the differences and in no other mathe- 
matical relations. Thus, if teachers A, B, C, are rated 10, 20, 30^ 
we know that they vary in ability by equal amounts. For our 
purposes, it would make no difference, if the measures were 110, 
120, 130, or 610, 620, 630, or 750, 760, 770, as long as the quantita- 
tive differences should be preserved. 

If we know where the true zero of teaching efficiency is, in rela- 
tion t^o the values 10, 20, and 30, then, of course, other mathemat- 
ical relations can be immediately worked out; but we do not know 
where the true asero point lies. 

APPLICATION OF THE METHOD 

In computing the final scale, we begin by making a rough ap- 
proximation of the probable order by inspection or by computing 
the median rank of each teacher for her fifteen ratings. From an 
inspection of the Town A grade ratings, it is easily seen that Ah — 
will be better than Co — , for example, since Ah — 's median rating is 
6, while Co — 's is 11+. The rough approximation is then refined. 
This approximation is done by comparing each teacher with the 
one next above and next below and by continuing the process for 
several places each way. Usually three places will be sufficient. 
This refinement is continued by finding the percentages of the 
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judgments which are in favor of each teacher and the percent- 
ages of the judgments which rate numerically lower each teacher 
in comparison with those near her. In every case only those 
judgments are used in which both teachers who are compared are 
also rated. The approximate arrangement, in some cases, will 
be wrong. Then the teachers must be shifted in order. By using 
the unit values which have been calculated from the table that 
corresponds to the percentage differences, a scale of amount of 
difference between the teachers can be built up. 

In actual procedure it is better to begin with the worst teacher 
and work up. A procedure of trial and success, with frequent 
shiftings back and forth, will be found more economical of time 
and patience than a more complicated method. 

Referring back to Table I, we find that in comparing Ah — with 
Dr — the fifth rating must be disregarded, as it is incomplete. 
In seven cases Ah — is higher numerically or worse actually than 
Dr — . There are twelve usable judgments in all. This gives 7 
out of 12 votes, as it were, against Ah — . The percentage is then 
foimd and its corresponding unit value. It is well to determine 
the percentage differences, not only between a teacher and the 
next one to her, but also for three teachers away. In this way any 
individual, if rightly placed, will be above not only the one next 
lower, but also above the three next in order. Where mistakes in 
order occur, then there must be some shifting. 

Two exceptions should be noted: Tie ratings are split, and 
when there is a 100 per cent agreement that one teacher is better 
than the next, then there is theoretically an infinite, or at least an 
unknown, amount of difference between them. In this case 
commonsense is as good a guide as any. Eithier the statistician 
can make two or three indirect comparisons through other teach- 
ers' ranks in comparison with the two in question, or he can as- 
sign a value to 100 per cent a Uttle larger than that assigned to 99 
per cent. The latter procedure was followed here as the 
amoimt of difference between a 99 per cent to 1 per cent vote is 
3.45 units. We arbitrarily assign 4.00 to 100 per cent to per 
cent comparison. 

While no doubt the explanation seems complex, the procedure, 
if followed as described above, will readily yield a well-made 
scale. Turning back to the transcript of ratings (see Table I), 
which were made by the Town A teachers, we find that the "Bet- 
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ter-Worse" columns give in detail the comparisons in judgments. 
It reads: 1 is worse than 2 (with no equal or tied votes) 7 times; 
1 is worse than 3 (with one equal or tied vote) 6 times. Of course, 
when the worse votes are less than the better, the table is used in 
the same way. 

Table II shows, in part, the amount of difference in percentages 
from the worst to the best, with comparisons two or three places 
removed for teachers of Group A, grade teachers, Town A. The 
numbers down the table and across are key numbers to the teach- 
ers' names, as has been noted in Table I. The table reads: 
Teacher 2 is better than teacher 1 by 58 per cent; teacher 2 is 
better than teacher 3 by 61 per cent; teacher 1 is better than 
teacher 3 by 54 per cent; teacher 8 is better than teacher 7 by 57 
per cent; and so on. 

TABLE II 

Amoxtnt of Diffebbnce in Percentages 
Groxtp a, Town A Grade Teachers 





1 


3 


6 
73 


4 

84 
05 
58 
68 


8 
81 


7 

75 
50 
50 
57 


5 

77 
75 
73 
77 
54 
54 


10 

54 
65 


12» 


12» 


12» 

54 
64 


19 

73 
64 
67 
58 
55 
44 
54 


12* 

57 
50 
53 


16» 
45 


16^ 

54 
57 


2 


2 


58 


61 
54 


62 

58 


50 
57 




1 




3 




6 




4 




8 




7 




5 




10 




12* 




12» 




12» 




19 









Step 4* The final step in getting a quantitative ranking is to 
change the percentage differences into amounts of difference. Here 
we use the table of percentage differences (see Table III). The 
table shows the unit values that correspond to percentage dif- 
ferences and is reproduced from Thorndike's MerUal and Social 
Measurements, p^e 123. 
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TABLE m 
Thb Amounts of Diffbrbncb (x—y) Gobbbspondinq to Given Pbbcbnt- 

AQS8 OF JXTDGMSNTB THAT X>y. 

% f"* THB Pbbgbntagb OF JiTDOMBNTB THAT x>y. A/P.E. » X — y, IN Mni;n- 

PLE8 OF THB DiFFBBBNGB SUCH THAT A% T IS 75. 



%r 


A/P.E. 


%r 


A/PJJ. 


%r 


A/P.E. 


%r 


A/PJS. 


%r 


A/P.E 


50 


.00 


60 


.38 


70 


.78 


80 


1.25 


90 


1.90 


51 


.04 


61 


.41 


71 


.82 


81 


1.30 


91 


1.99 


52 


.07 


62 


.45 


72 


.86 


82 


1.36 


92 


2.08 


53 


.11 


63 


.49 


73 


.91 


83 


1.41 


93 


2.19 


54 


.15 


64 


.53 


74 


.95 


84 


1.47 


94 


2.31 


55 


.19 


65 


.57 


75 


1.00 


85 


1.54 


95 


2.44 


56 


.22 


66 


.61 


76 


1.05 


86 


1.60 


96 


2.60 


57 


.26 


67 


.65 


77 


1.10 


87 


1.67 


97 


2.79 


58 


.30 


68 


.69 


78 


1.14 


88 


1.74 


98 


3.05 


59 


.34 


69 


.74 


79 


1.20 


89 


1.82 


99 
^100 


3.45 
4.00 



^ Arbitrarily taken. 



TABLE IV 

DiFFBBBNCBS IN AMOUNT OF DiFFBBBNGE FOB THB LoWEST ThIBTEBN 

Tbachbbs in Town A, Gbadbs, as Judged bt Gboup A 



No. 



49. 
50. 
47» 
47> 
43« 
43F 
4» 
43> 
41. 
87» 
40. 
37» 

dp 



Namb 


DiFFEBENCB 

OF Amount 


Pbb Cent 


Amount OF 

Peb Cent bt 

Table III 


Cd— 


1.00 







Rd— 


1.37 


60 


.376 


Dn— 


1.37 


50 


.000 


Rt— 


1.86 


63 


.492 


De— 


2.32 


62 


.453 


CI— 


2.69 


60 


.376 


Pm— 


2.88 


55 


.186 


Fm— 


3.03 


54 


.149 


Jr— 


3.21 


55 


.186 


By- 


3.21 


50 


.000 


Me— 


3.71 


63 


.492 


My— 


4.12 


61 


.414 


Ws— 


4.34 


56 


.224 
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Table IV shows the differences in amount of difference for the 
lowest thirteen teachers in the grades of Town A as judged by 
Group A. 

In this way we build up a scale of teaching ability in terms of 
amoimt, based upon fifteen sets of judgments. In actual practice 
one need not carry the differences in terms of amount to more than 
the first decimal place. This scale may now be used as a rating 
scale of the teachers with which to correlate any other significant 
scale of those teachers. Thus we may take their ages and, by 
correlation, find out whatever influence age may appear to have, 
or we may take professional training, salary, etc. 

POSSIBLE EBBOBS 

In constructing a scale of this kind there is the possibility that 
two types of errors may be met; namely, constant errors and 
variable errors. 

For constant errors we cannot compensate. A constant error 
would be a universal tendency on the part of the judges, for ex- 
ample, to rate high those teachers who were graduated from 
Hajrvard and to rate low those teachers who were graduated from 
Yale, because it might popularly be thought that graduation 
from Harvard signified something that graduation from Yale 
did not, when in point of fact it makes no difference which college 
the teachers had attended. Another more naive type of constant 
error would be to rate high certain teachers because they are 
blondes and to rate low other teachers because they are brunettes, 
when complexion is not a determining factor. If all judges should 
consistently err in some such ways as these, then we would have a 
constant error. 

Variable errors are entirely taken care of statistically. Errors 
of this type operate when the judges do not know all teachera 
equally well. They also occur in reporting clerical mistakes that 
are made. These variable errors balance each other and by 
proper treatment are either eradicated or at least exposed. 

The most frequent suspicion of the validity of this method of 
rating is due to the feeUng that teachers do not know each other 
and, therefore, cannot judge each other. Teachers, however, 
receive a fairly good idea of each other through conversations, the 
remarks of pupils, general reputation, appearance in the halls, 
teachers' meetings, and other sources. 
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While it is quite probable that teachers do not know, in minute 
particulars, whether other teachers are good or poor, it would be 
unwise to claim that teachers do not know pretty well whether 
their associates are successful or not. 

We have seen how a quantitative ranking of the grade teachers 
of Town A by a chance half of the judgments was obtained. The 
quantitative ranking of the same teachers by another chance half 
of the judgments was similarly made. For the highnschool teach- 
ers of Town A, for the grade and highnschool teachers of Town B 
and Town C, respectively, exactly the same computations were 
made. We have, then, two sets of rankings for the teachers of 
these six groups. 

THE DEPENDABILITY OF THE DATA 

If there is high agreement between two chance halves of the 
judges, such an agreement is evidence of the reliability of the 
data. The following paragraphs are concerned with establishing 
the reliability of the judgments upon which the ratings are 
based. 

AGREEMENT BETWEEN TWO GROUPS OF TEACHERS WHO JUDGE 
THEBiSELVES FOR GENERAL TEACHING ABILITY 

The agreement between Groups A and B for general teaching 

ability is shown by the following correlations: 

Correlations 

Town A [ G""^® teachers (63) +.941, db.016 

\ High-school teachers (15) +.894, ^.05 

Town B / Grade teachers (36) +.906, ±.03 

1 High-school teachers (13) +.894, =t.06 

Town C / ^""^® teachers (30) +.813, ±.06 

\ High-school teachers (10) +.664, ±.17 

These are raw correlations. The highest, +.941, was from the 
largest group, the next highest from the next largest group, and 
the lowest correlation from the smallest group. The average 
correlation, with weighting for size of group, is +.882=*= .01. 
While it is a fact that the larger group and the higher correlation 
go together, this fact should not be taken to mean more than it 
actually does. We know in general that the smaller the group the 
more effective becomes the influence of error and that findings 
for large groups are usually more dependable than those for small. 



18 QuaUties Related to Success in Teaching 

The average correlation of +.882=^.01 shows that there is a 
high degree of resemblance between the findings of one group and 
those of the other. In other words, if the correlation had been 
zero, then there would have been nothing but a chance agreement 
as to who was the good teacher and who was the poor teacher. 
If there had been a correlation of —1.00, it would have implied 
that a teacher who was highly esteemed by one group would have 
been thought meanly of by the other group. If there had been a 
correlation of +1.0, it would have implied that there was per- 
fect agreement between the two groups in their estimates of 
teachers. 

In a range of — 1 .0 to + 1 .0, an average correlation of + .899 * .01 
is seen to mean an amount of agreement that is, indeed, very 
significant. This inner consistency may be taken to connote 
that teachers' estimates of each other are by no means a hit-and- 
miss affair, but that there is a practical unanimity of opinion 
concerning teaching abiUty. Errors in judgment tend to lower a 
correlation and there could not be very much of chance guessing 
in a set of judgments which correlate with a similar set as highly 
as +.899,=*= .01. 

AGREEMENT BETWEEN TWO GBOUPS OF TEACHERS AND THE SUPER- 
VISORS' JUDGMENTS WHO JUDGE THE SAME TEACHERS FOR 
GENERAL TEACHING ABILITY 

In the Town A judgments, the supervisory force consisted 
of the superintendent, the principals who spent fuU time in 
supervision, the supervisors of music, drawing, and physical 
education, health officers, and two school-board members who 
were especially well informed concerning the teachers. In Towns 
B and C the judgment of the superintendent alone was available. 
There is every reason, however, to assume that these judgments 
were of a high order. 

The second method of making an estimate of teaching ability 
was to have the supervisors rate each teacher. This was done by 
the relative-position method. The statistical procedure was 
similar to that which was used in deriving the ranking of teach- 
ers from the summation of the ratings of the teachers.^ 



^ These rating? are reported in full in the data sheets which are filed at 
Teachers College, Columbia Uniyersity. 
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By correlation we find the following agreement between the 
ratings of teachers of each other and corresponding ratings by 
the supervisors: 

r between Super- r between Super- 
visors and Group visors and Group 
A Teachers for B Teachers for 
General Teaching General Teaching 
Ability Ability 

Town A (^"^® teachers -f .934, ±.01 -f .999, db.OO 

^High-schoolteachers +.930,^.03 +.606, d:. 16 

rp^^^ g f Grade teachers -f .976, ±.00 +.976, =fc.00 

1 High-schoolteachers +.972, ±.01 +.833, ±.08 

Town C / ^""^® teachers +.969, ±.01 +.913, ±.03 

\ High-school teachers +.912, ±.05 +.761, ±.13 

Average +.974, ±.00 +.951, ±.00 

Averaging the correlations +.974 and +.951, for the correla- 
tion between the judgments of supervisors and teachers in their 
rating for general teaching ability, we get the coefficient of corre- 
lation +.962. These figures may mean that the teachers judge 
as they do, because they know in a general way what the super- 
visors think and therefore make their ratings agree as far as they 
can with those which they think the supervisors will give. Or 
they may mean just the opposite. More reasonable is the 
opinion that teachers and supervisors alike have access to the 
same information and therefore form similar judgments from 
a consideration of similar data. 

Whatever may be the explanation of the high correlation be- 
tween the judgments of the teachers themselves and their super- 
visory officers, the important thing is that the correlation is high. 
If supervisors can form a fair ranking of teachers, then teachers 
can rank themselves, as is shown by the high correlation (+ . 962) 
between the Group A plus Group B teachers and the supervisors 
who judged for general teaching ability. 



AGREEMENT BETWEEN PUPILS' JUDGMENTS OF TEACHEBS AND 
OF TEACHEBS AND SUPEBVISOBS WHO JUDGE THE SAME 
TEACHEBS 

There was one group of pupils (nearly 200), in the grades of 
Town A, who were receiving instruction from eleven different 
teachers. These pupils rated their teachers under dignified and 
respectful circumstances. The exact method will be explained 
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in a later connection. The correlation between the scale values 
which these eleven teachers received in the rating by mutual 
judgments and the pupils' ratings was +.681. The high-school 
faculties of Towns A and C were similarly rated, with the result- 
ing coefficients of correlations between mutual judgment rating 
and pupils' ratings of + . 807 for Town A high school, and + . 684 
for Town C high school. 

Dividing the pupils' ratings into two chance groups (A and B) 
and correlating with the supervisors' estimates, we get the fol- 
lowing results: 

r Group A Pupils' r Group B Pupils' 
Estimates and Su- Estimates and Su- 
pervisors' Esti- pervisors' Esti- 
mates mates 

^ . r Grade teachers -f.876 -f.656 

^®^^\HighHschool teachers -f.682 -f.730 

Town G, High-school teachers +.631 +.738 

Note. — ^The pupils' ranking of teachers in the sprades of Town C could not 
be obtained. The grades were not departmentahzed and hence pupils were 
acquainted with too few teachers. 

From three separate sources, scales, in units of amount, for the 
general teaching ability of the teachers, have been obtained. 
The correlations are of such a nature that one is warranted in 
assuming that the ratings which have been given by either group 
of the teachers themselves are dependable ratings of general 
teaching abiUty. 



CHAPTER III 

MEASURABLE FACTS RELATED TO GENERAL 

TEACHING ABILITY 

We have now a rating for general teaching ability for 156 teach- 
ers. The data which show the correlation between success in 
teaching and certain measurable facts concerning teachers will 
now be presented. As a measure of teaching ability both the 
teachers' mutual ratings of Group A and the supervisors' ratings 
will be used. In all instances the correlations are computed by 
the Pearson formula. 

THE SIGNIFICANCE OF AGE 

The coefficients of correlation between teaching ability and age 
for each of the six groups of teachers have been computed. The 
age at the last birthday has been taken. Fractional parts of a 
year have not been used. It has not been possible to check the 
correctness of the ages of all the teachers, but those teachers who 
were members of the State Pension Fimd, and practically all the 
teachers were members, were checked from the affidavits which 
are on file at the office of the Fund. The ages were of April, 1919. 

Typical examples of distributions, for Town A, grade teachers, 
are inserted on the following page. 

It would seem that a teacher's age is not a very good index of 
her general teaching ability. This same negligible and often 
negative correlation has been found in other studies even when a 
more lenient method of determining teaching ability has been 
used. 

There is one factor, however, which may have some effect on 
the correlation. In Town A and in Town B no teacher is at pres- 
ent employed who has not had elsewhere two years of successful 
experience. The rule has been in force for two years. On ac- 
count of the exclusion, for even this short time, of very yoimg 
teachers, the correlation may have been affected as it would not 
have been affected in other school systems. 

The coefficients of correlation, which have been given above, 
should not be carelessly taken to mean that there is no relation- 
ship between general teaching ability and age. Obviously, a 
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child could not teach. Excessive old age, on the other hand, is 
not a negligible factor in determining general teaching ability. 
Within the limits of ages at which people actually do teach, age 
appears to be an irrelevant factor. 

1. Distribution by Ages in Years: No. of Teachers 

25 or under 10 

25 to 30 18 



30 to 35.. 
35 to 40.. 
40 to 45.. 
45 to 50.. 
50 to 55.. 
55 to 60.. 
60 or over, 



4 
7 
3 
5 
2 
1 
2 



2. Distribution by Years of Experience: 

Less than 5 11 

5 to 10 20 



10 to 15 
15 to 20 
20 to 25 
25 to 30 
Over 30. 



3 

4 
4 
4 
6 



These are grouped distributions. In computing the correlations, the actual 
fact was used in each case. Grouping of this kind, however, is sufficient to 
show the facts of distribution. 

3. Distribution by Amount of Professional Study while in Service : 

No professional study 33 

Professional study equivalent: 

To one sunmier-school session of work in education 12 

To two summer-school sessions of work 6 

To three summer-school sessions of work 1 



r of Age with 
Group A Rating 
of Teachers' Judg- 
ments for General 
Teaching Ability 

Grade teachers + • 191 

High-school teachers — . 151 

Grade teachers + .050 

High-school teachers + .525 

Grade teachers —.050 

High-school teachers + .604 

Average (weighted for number in each 
group) -f.l35db.07 



Town A 



TownB 



TownC 



r of Age with 
Supervisors' Esti* 
mates of General 
Teaching Ability 

-f.047 
—.001 
—.100 
+ .422 
—.108 
+ .336 

+.0298 db. 07 
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THE SIGNIFICANCS OF EXPEBIENCE 

The factor of experience has been studied in two ways — (1) 
total experience in teaching, wherever that experience has been 
gained, and (2) the experience gained in the present school sys- 
tem or position. No significant mutual relationship appeared. 
These correlations may be affected by the fact that teachers who 
are fresh from the normal schools are not engaged. In these 
data experience as such mattered little. While it is clear that a 
teacher as she becomes older does not necessarily become better 
by any process of inner growth, it is also clear that the older 
teacher is not necessarily the poorer one. As far as these data 
reveal the true situation, neither amount of experience nor age 
should be considered factors of large significance in the assurance 

of teaching success. 

r between Total r between Total 
Experience and Experience andl 
General Teaching General Teaching; 
Ability as Deter- Ability as Deter- 
mined by Group mined by Super- 
A, Teachers' Judg- visors' Judgments, 
ments 

ij, . f Grade teachers -f.018 +.102 

1 High-school teachers —.079 — . 102 

-, ^ f Grade teachers +.140 +.135 

•^^^^ 1 Highnachool teachers +.631 +.422 

-, p J Grade teachers —.249 +.135 

•^^^^'^l High-school teacher —.180 +.340 

Average (weighted for number in each 
group) — . 0386, db . 1 1 + . 140, ± . 10 

r between Local r between Local 
Experience and Experience and 
General Teaching General Teaching 
Ability as Deter- Ability as Deter- 
mined by Group mined by Super- 
A, Teachers' Judg- visors' Judgments 
ments 

rj,^^ */ Grade teachers +.047 —.016 

^^ 1 High-school teachers -.079 —.066 

« ,1 / Grade teachers +.144 +.089 

1 High-school teachers +.364 +.260 

Ij,^^ p f Grade teachers —.148 —.276 

\ High-school teachers +.610 +.416 

Average (weighted for number in each 

group) +.].34,:l2.73 +.187i.7 

8 
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If there were little difference in the amounts of experience that 
the teachers possessed, then the coefficients of correlation would 
be low, not because experience was not a factor in determining 
general teaching ability, but because all teachers had the same 
amount of it. The amoimts of experience, of age, and of pro- 
fessional study in the case of the grade teachers of Town A have 
already been shown to illustrate what the differences in experience 
actually were. 

The same is true of salary, age, or any other factors. In short, 
if series A and series B are to be correlated, no distribution in 
either A or B would mean a zero correlation. But in all of our 
series distributions do occur. The low correlations cannot be 
accounted for by absence of distribution. 

COBBELATION OF TEACHING ABILITY AND SALABT RECEIVED 

At the time that this study was made there was no adequate 

salary schedule operative in Town A. Although there were some 

salary differences, many factors, other than those of services 

rendered, were effective in determining the salaries which were 

paid. In Town B and Town C, however, salary schedules, based 

on merit, were already well started. The correlations, therefore, 

are of particular interest. 

r between Salary r between Salaiy 
and General Teach- and General Teach- 
ing Ability as Deter- ing Ability as Deter- 
mined by Group mined by Super- 
A, Teachers' Judg- visors' Judgments 
ments 

rj, . I Grade teachers Not computed Not computed 

^^^ \ High-schoolteachers Not computed Not computed 

_ ^ / Grade teachers +.359 +.410 

^^^"1 High-schoolteachers +.130 +.089 

^ ^ J Grade teachers +.575 +.083 

lown O ^ High-schoolteachers +.676 +.615 

Average (weighted for number in 
eachgroup)! +.434, ±.08 +.263, db.09 

It will be seen that the coefficients of correlations, although 
they are not high, are at least positive, even in the judgments of 
the teachers themselves. In view of the fact that men are paid 
higher than are women, though not primarily for their better 

^ Town A not counted; others weighted for size of groups. 
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service, but because of their sex, it may be a fact that a few 
men in the grades are receiving relatively high salaries, although 
they have only moderate ability. If this be so, then the coeffi- 
cients of correlation will be somewhat lowered. Perhaps we 
should then more properly compute the correlations for women 
only. In neither Town B nor Town C was this the case, for there 
was only one man involved, and he was rated high and paid well. 
I mention this possible condition, to caution those who are making 
similar studies in which important sex differences occur. 

In the case of the high-school teachers, however, a distinction 
of sex should be made in order to eradicate sex as a factor — ^and 
not ability as a factor — in the salaries which teachers receive. 



THE RELATION BETWEEN GENERAL TEACHING ABILITY AND SCORES 

MADE IN TWO PSYCHOLOGICAL TESTS ^ 

Approximately one himdred teachers were given psychological 
tests. 

The first test might be called a test of mental alertness. It has 
been used sufficiently in many other connections to warrant the 
placing of considerable confidence in it. This test was divided 
into two parts. Each part lasted somewhat over thirty minutes. 
Before the test was given, the teachers had an opportunity to 
look over a similar test so that unfamiliarity with the material 
would not be a handicap. The scoring and the methods of com- 
puting final ratings are standardized. 

r between General r between General 

Teaching Ability Teaching Ability 

as Determined by as Determined by 

Group A Teachers' Supervisors' Es- 

Judgments and timates and Men- 
Mental Alertness tal Alertness 

-, ^ f Grade teachers —.099 +.115 

^^^ 1 Highnachool teachers +.381 +.346 

-, -^ I Grade teachers +.306 +.230 

\ High-school teachers +.545 +.484 

A second intelligence test was given to the grade teachers and 
high-school teachers of Town A. The r gained from the second 
tests were: 

^ . /Grade teachers..... +.060 +.179 

^^^'^iHighnwhool teachers +.4^0 +.648 

^ The tests used here are known as the first section of Thomdike College 
Entrance Examination. 
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The correlation between the two tests was +*812. It shows 
to what extent the same ability was measured in both tests. The 
correlation between general teaching ability in the elementary 
grades, as estimated by teachers who served as judges, and intel- 
lect, as measured by this test, is +«173, ^ . 10. The correlation 
between general teaching ability, as estimated by supervisors 
who served as judges, and intellect, as measured by psychological 
tests, is + . 156, =*= . 10. 

The correlations are distinctively higher in the case of high- 
school teachers. The mutual relationship between intellect and 
teaching ability, as measured by teachers' estimates, is +-446, 
^ . 16. When general teaching ability is estimated by the super- 
visors the correlation is + . 410, =*= . 16. These correlations are 
averages which have been weighted for the size of teacher groups. 

By using these two tests of separate measures of intellect and 
by using the Group A mutual judgments and the supervisors' 
estimates as two separate measures of teaching ability, we can 
correct for attenuation, and get a final correlation of +-57 be- 
tween general teaching ability and scores which have been made 
in psychological tests, as in the case of high-school teachers. 

The practically zero correlation between the teaching ability 
of grade teachers and mental alertness, as measured by test, does 
not mean that intellect is an irrelevant factor in teaching. For 
there is no occupation in which intellect is not to some extent 
useful. Even a man with a pick can use his intellect to advantage 
in deciding where best to grasp the handle of the pick, in deter- 
mining the distance which one foot should be ahead of the other, 
and in arriving at other conclusions. Intellectual differences, 
however, among those who use the pick are not as significant as 
they would be among surgeons, philosophers, or psychologists. 
Although brains are of use in picking, it is also true that physical 
strength, lung expansion, large nasal passages, and other factors 
are relatively of much greater importance than intellect. 

In elementaryHschool teaching, even in the most routine work, 
intellect can be used and is used, but patience, industry, sym- 
pathy, and other qualities are relatively of greater importance 
than intellect. The differences in intellect among teachers are, 
as it were, lost in the complexities of differences in the amount of 
many other traits which are also important in elementary-schooL 
teaching. 
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For highnschool teachers this is not so correspondingly true. 
There does appear to be 8<yme relationship between differences in 
intellect and differences in teaching ability. High-school pupils 
are more mature; the content of high-school subjects is less under 
the spell of method than it is in the elementary-school subjects. 
Therefore, sheer intellectual ability does operate in a way that 
it does not seem to operate in the elementary-school teaching. 
We have too few cases, however, from which to generalize. 

THE SIGNIFICANCE OF ABILITY TO PASS A PBOFESSIONAIi TEST IN 
BELATION TO GENEBAL TEACHING ABIUTT 

Tests of a professional nature are often used as a means of 
determining a candidate's fitness for election or promotion. It 
was, therefore, entirely within our province to determine, as ac- 
curately as possible, the correlation between the ability to pass 
a professional test and the ability to teach. An examination,^ 
which called more or less definitely for knowledge of the technique 
(rf teaching, was given. The time allowed was seventy minutes. 

While no objective means for correction* were obviously avail- 
able, due care in the correction work was taken. The names of 
the teachers were not written on the papers until after the cor- 
rections were made. By this method, any weakness on the part 
of the examiner, to favor some papers and to discriminate 
against others, was avoided. 

For grade teachers in Town A the r between ability to pass a 
professional test and teaching ability as determined by teachers' 
ratings was +«450 (number of cases 33). The r between ability 
to pass a professional test and teaching ability as rated by super- 
visors was +.767 (number of cases 33). 

For high-school teachers in Town A the r between ability to 
pass a professional test and ability to teach as determined by 
teachers' ratings was +.147 (number of cases 7). The r between 
ability to pass a professional test and ability to teach as rated by 
supervisors was +.001. 

It is unfortimate that more eases could not have been secured. 
From the evidence which we have, it would seem that knowledge 
which is required to pass a test such as the one referred to above 

^ A copy of the examination used is filed with origiiial data at Teachers College,. 
Goliimbia University. Rerised copies known as 'VA Trade Test for Elementaiy- 
School Teachers'' by Knight and Franzen can be secured from the writer. 

^^The correction of the examinations was made by the writer. 
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is not necessary for a person who wishes to be successful in high- 
school teaching. 

For elementaryHschool teaching such knowledge is much more 
needed. Variations among teachers in their ability to pass a 
test such as the one referred to above is more significant than 
variations in their age, in their experience, or in their salary. 

These data strongly suggest the practicability (1) of selecting 
high-school teachers by psychological test and (2) of selecting 
elementary-school teachers by a test which involves a knowledge 
of the technique of teaching. 

If professional tests could be made as accurate tests of technical 
knowledge as psychological tests are made tests of intellect, then 
the correlation between the achieved scores and success in ele- 
mentary-school teaching might well be measurably increased. 

Further, if high-school teachers were given a more extended 
psychological test, let us say at least three hours instead of one, 
as here indicated, then the results might be even more indicative 
of their teaching ability as a qualification for work in the high 
school. 

THE SIGNIFICANCB OF PROFESSIONAL STUDY WHILE IN SERVICE IN 
RELATION TO GENERAL TEACHING ABILITY 

Much stress has lately been placed upon the value of profes- 
sional study while in service. Many school administrators place 
a high value upon summer-school and university study in which 
teachers may engage. In many cases salary adjustments are 
made in part, at least, upon the fact that a specified teacher has 
taken professional courses in education. 

It is of value to ascertain what effect this professional study 
has upon the general teaching ability of an individual teacher or 
group of teachers. We know in the cases of the teachers whom 
we have studied closely, how these teachera stand in general teach^ 
ing merit and also the amount of professional study while in 
service which they have to their credit. 

Is it the case that those teachers who are studying their pro- 
fession are also the teachers who stand high in general teaching 
merit? Even if this were the fact, it would not be clear just what 
the fact might mean. For example, it might mean that, because 
a teacher studied, she gained power and was, therefore, a better 
teacher. It might mean that, because she was a good teacher, 
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she was, therefore, deeply interested in the technique of teaching 
and consequently studied. It might mean that the motive for 
doing the proper thing is operative and that those teachers take 
summerHschool'work who are most easily influenced by the desire 
to please their administrative officers. It might mean that the 
correlation between good teaching and professional study is more 
or less fictitious. 

It is also true that many teachers would like to study, but, for 
domestic and financial reasons, cannot do so. The teacher may 
be good in her work because she studies. She may study, how- 
ever, because she is good in her work. On the other hand, whether 
she studies or not may have an indifferent relation to her merit. 
Finally, the true relation between professional study and teaching 
ability may be a composite of all these possibilities, which is prob- 
ably the case. Unfortimately, for this consideration my data 
are scant. 

In Towns B and C so few teachers had done any organized 
study, while they were in service, that no relation could be estab- 
lished in these school systems between professional study and 
quality of service. In Town A enough of the teachers had done 
summer-school and university work while they were in service to 
make the study worth while. 

Six weeks of professional study in one course was coimted as the 
unit of measurement of professional study. Amoimts of profes- 
sional study are not as good measures as amounts of study plus 
quality, but the quality of professional study during service is a 
fact too elusive to obtain. Those teachers who undertook no pro- 
fessional study were, of course, rated as having done a zero amount. 

The correlations follow: 

r between General r between General 

Teaching Ability Teaching Ability 

as Determined by as Determined by 

Group A of Teach- Supervisors' Es» 

era' Judgments timates and Pro- 

and Professionial fessional Study 
Study 

^ . r Grade teachers -f.276 -f.381 

^^IffighHschool teachers -f.422 -f.364 

Number of cases used in this computation, 52. 

In Town A no teacher had been forced to study. While some 
premium had been placed on professional study, even those who 
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had been given the opportunity of professional study were not 
chosen from among the ablest teachers. This fact would tend to 
lower the correlation. We cannot, of course, tell whether some 
teachers who had done professional study would have been dif- 
ferently rated if they had not done so. We cannot tell, on the 
other hand, how other teachers would have been rated had they 
undertaken professional study. In view of the presence of irrele- 
vant factors which tend to lower the correlation, it seems fair to 
say that, under ideal conditions where all teachers can study, if 
they wish to do so, the true correlation between professional study 
while in service and teaching merit will be no lower and, in all 
probability, will be higher, than the correlation which we have 
obtained. 

The effect of professional study is as yet not clear. The fact 
of a positive though small correlation, in spite of factors which 
tend to lower it, seems to justify the use of such a factor as 
the amount of professional study for diagnostic and prognostic 
purposes. 

IS QUALITY OF PENMANSHIP AN INDEX OF TEACHING ABILITY? 

In the professional test (see page 27) there was a copy of a 
letter, uncapitalized and unpunctuated, which was to be copied. 
This, of course, would be interpreted by any teacher who took 
the examination as an exercise in punctuation and capitalization 
— and such it was. It might also be used as a very convenient 
test of the quality of a teacher's handwriting. We get a much 
truer picture of how teachers write under working conditions, by 
using material of this kind, than we can get if we merely asked 
teachers to furnish a specimen of their handwriting. 

This material, used as an index of the handwriting ability of 
teachers, was scored for legibility by using the Thomdike scale, 
and quantitative values were assigned to the specimens of hand- 
writing. The scoring was made with the scorer ignorant of the 
names of the persons who wrote the specimens. The correla- 
tions between the legibility of the teachers' handwriting, as ex- 
pressed in terms of amount of legibility, and their general teach- 
ing ability rating follow: 
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r between General r between General 
Teaching Ability Ability as Deter* 
as Determined by mined by Super- 
Group A of Teach- visors' Estimates 
ers' Judgments and Legibility of 
and Legibility of Handwriting 
Handwriting 

Town A, Grade teachers +.001 +.012 

That legibility of penmanship is no index of teaching ability 
seems clear. It should be added, however, that the variations in 
the legibility of the handwriting were small. Most of the teach- 
ers are so-called ''Palmer handwriting certificate holders." The 
restricted spread in differences in handwriting ability is not 
needed to explain the zero correlation. As a matter of common 
sense there is no causal relation between handwriting and ability 
to teach. 

THE inrruAL relation between general teaching abilitt 

AND NORMAL-SCHOOL SUCCESS 

The relationship between general teaching ability and normal- 
school success has been obtained in two ways. I^irst, a study was 
made of the relation between those teachers who came from the 
same normal school and those teachers who now teach in the 
same group. By this rigorous requirement errors have been 
diminated which would exist if comparisons were made with the 
records of teachers who came from different normal schools, or 
if comparisons were made with the records for teaching ability 
which would come from equating the relative merits of teachers 
who work in different systems. 

It is exceedingly difficult to get, for any considerable number of 
teachers, accurate measures of the teachers' standings in normal 
schools and of the success in teaching of the same group. For, 
upon leaving the normal schools, the graduates scatter. In any 
school system there are teachers who come from so many different 
schools and at such widely varying times that large error is Ukely 
to creep into any investigation of the relation of normal-school 
success to general teaching ability. 

While the procedure adopted in this study was calculated to 
reduce error and doubtless did reduce the error, it also reduced 
the number of cases. The correlations, which are positive, are 
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for grade teachers, +.147 (19 cases); for highnschool teachers 
with college records, Town B, +.600 (6 cases). 

The standing in normal school or college was determined by a 
complex process. The grades which determined the total stand- 
ing were all added together and divided by the number of grades. 
All grades were not counted as having equal value. Thus a grade 
"A'' in English was not counted as the equal of a grade "A" in 
history. The values of the grades "A," "B," "C," "D," "E" 
of each study were determined by taking all the grades of each 
study and by computing the percentage of each grade of the 
total. Then a probability curve for them of form "A" was as- 
sumed. A computation of their value in terms of the standard 
deviation (S.D.) distance from the mean was made. Thus the 
inequalities of grading in each department were to some extent 
at least neutralized. As an illustration, let us consider the 194 
grades in English, which were distributed as follows: 11 or 5 
per cent were "F"; 20 or 10 per cent were "E"; 27 or 19 per cent 
were "D"; 74 or 38 per cent were "C"; 49 or 26 per cent were 
"B"; 3 or 1 per cent were "A." 

Assuming a normal distribution of ability and using the S.D. 
values which are found in Thomdike's Mental and Social Measure^ 
ments, we assign to ''E" a value of minus 20, to "D" a value of 
minus 12, and so on.^ The values of the several grades in the other 
subjects were similarly determined. The grades received in 
practice teaching, English, arithmetic, history, science, and 
method were used in the computation. We do not know the quan-- 
titative value of "F" in terms of "E" or "D." We cannot say 
that '' B '' is twice as good as '^ E," etc. These marks can be used, 
however, in denoting relative positions in a group or series. 
These relative positions, in turn, can be translated^ into terms of 
amount. The correlation was obtained between teaching ability 
and normalHschool standing for such persons only as were teach- 
ing in the same group and came from the same normal school. 
This rigorous method of selecting data reduced the number of 
available cases to 19. 

This correlation of teaching ability with normal-school success 
or standing is dependable, because it uses a very accurate rating 
of teaching ability. The method of its determination has elim- 

* For the prooefls see Thomdike's Mental and Social Mea8urement8f Table 54, 
p. 221. 
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inated errors in obtaining normalHschooI standing by taking only 
those teachers from the same normal school and by evaluating 
the marks which the teachers received in different studies. Vari- 
ations in the meaning of marks, however, may occur in one sub- 
ject from year to year as they do from subject to subject. 
Whether marks actually do or do not vary is a matter of conjec- 
ture. The real weakness in this correlation is due to the small 
number of cases. Since a correlation using less than 50 cases 
lacks numerical strength, the correlation between normal-school 
standing and teaching ability has been computed in still another 
way. The following assumptions have been made: 

1. It is assumed that the median teacher in one system is, to 
all intents and purposes, equal in teaching ability to the median 
teacher in the other two systems, and that the summation of the 
quantitative variations from the median in one system is equiva-^ 
lent to the simimation of the variations in either of the other 
two systems. While this is an assmnption, the correctness of 
which cannot be proved, it is reasonable. The three systems 
which have been studied are all within metropolitan Boston; they 
draw teachers from the same normal-school systems; they pay 
about the same salaries; they fit pupils for the same colleges; 
and they are not widely dissimilar in size. It is fair to assume 
that the teaching forces are about the same. 

2. It is assumed that the marks in any one subject in a normal 
school mean about the same as they do in any other subject in 
that school. This was the fact when the values of marks which 
were given in the several subjects by the Salem Normal School 
were computed for the first correlation. If the values of the 
marks varied to some degree, the final result would not be seri- 
ously affected, for the variations would be in all directions and 
would have the same effect as chance errors. 

3. It is assumed that, while the individual marks in one normal 
school do not mean the same as they do in another, the composite 
marks are comparable. That is, if we find that all the teachers 
whom we studied came from the Salem Normal School, then a 
certain teacher is the median in normalnschool standing for that 
group of teachers and her standing is, for our purpose, the same 
as the standing of the median teacher from any other normal- 
school group that we may study. It is necessary to make this 
assumption in order to get enough cases. It is not an unusual 
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assumption, although objections to it are perfectly allowable on 
mathematical or theoretical grounds. 

The normal schools which are studied are all in the same part 
of the country; they pay about equal salaries to their faculties; 
they have similar courses of study; they are in practically equal 
repute; they draw about the same class of pupils; they are super- 
vised, with one exception, from the same office; and they have 
such professional relations among themselves that their ideals and 
standards are largely mutual. Their graduates are attracted to 
the same school system and are equally well thought of. It seems 
reasonable to assume a practical identity of work required. This 
assumption is generally made in practical school administration. 
It is admitted it has statistical shortcomings, but its validity for 
this purpose may be allowed. Working upon these assumptions, 
the writer has computed the correlation between thenormalHschool 
standing of 53 teachers and their success in teaching out in the 
field, which is +.333. The ratings given to teachers by their 
fellow-teachers were used as the quantity to represent teaching 
ability. The normal-school standing was the numerical value of 
the average grade received. 

THE VALUE OF PUPILS' ESTIMATES OF TEACHERS 

In school administration we have never taken into account the 
fact that the estimates of pupils of their teachers might be valu- 
able. The importance of having content to which pupils would 
respond and of having methods to which pupils would favorably 
react has been repeatedly discussed, but we have assiuned, on the 
whole, that pupils' judgments of their teachers were either unob- 
tainable or useless. 

We may yet find that there is a closer relationship between 
pupils' success in school and their reaction to the teacher than 
there is between their success and the methods of teaching read- 
ing, or the size of print in the text-books, or the amount of play 
space, or any other so-called important factor of school manage- 
ment. 

Pupils may be as competent judges of good teaching as anyone 
else. They are certainly the most concerned. Data will show 
that it is not the poor teacher in the eyes of the supervisor who is 
the good teacher in the eyes of the pupils. 

The estimates of the pupils were obtained by asking the pupils 
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to write on a sheet of paper the names of all the teachers whom 
they had ever had. Then it was explained to them that they were 
going to say which of all these teachers, all things considered, was 
the best. Reasonable precautions were taken in giving the di- 
rections to have the pupils imderstand the meaning of heat and 
the importance of making the most deliberate ratings that they 
could. In all cases the pupils were told not to write their own 
names and not to hurry in their answers. After the names of 
teachers whom they considered best, they wrote the word heat. 
After the next best teacher, they wrote the word next; and after 
the third best teacher, they wrote the word third. In this manner 
the following groups of pupils judged their teachers; two groups 
of high-school pupils, one of seventh-grade pupils and one of 
eighth-grade pupils. To offset the factor of forgetting on the 
part of the pupils, in the computation only those teachers whom 
the pupils were having at the time of their making the judgments 
were considered. 

The two high-school groups were obtained by having the prin- 
cipal call together the forty most dependable pupils in the school. 
•The elementary-school group was composed of 200 pupils in the 
departmentalized grades of a school in Town A. The teachers 
who were judged by each group of pupils fell into three groups — 
11, 15, 13 — or sets of cases. The writer is certain that the pupils 
responded thoughtfully to his request for their judgments and 
that careful opinion was expressed. Each group of pupils' judg- 
ments was then divided into two chance groups and these were 
treated separately. The fact that the correlations between these 
groups were + . 767, + . 517, + . 905 respectively for three groups 
of pupils shows that factors of chance were not operating to any 
great degree. 

The correlations between the pupils' estimates of teachers and 
the estimates of the fellow-teachers and supervisors, follow on 
the next page. 

Using the two halves of the pupils' estimates as two independ- 
ent measures of the pupils' opinions and the mutual judgments of 
teachers and the supervisors' estimates as two independent esti- 
mates, we correct these correlations for attenuation. The cor- 
rected correlation between pupils' estimates and adult estimate 
of teaching ability is found to be + . 784. 
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Town A 



TownC 



Group A, Grade teachers 
Group B, Grade teachers 
Group A, HighHSchool 

teachers 

Group B, HighHSchool 

Group A, Grade teachers 
Group B, Grade teachers 
Group A, HighHSchool 

teachers 

Group B, HighHSchool 

teachers 



Average 



r between Teaching 
Ability as Deter- 
mined by Pupils' 
Estimates and by 
Teachers' Esti- 
mates 

+ .681 
+ .380 

+ .807 

+ .600 
+ .684 
+ .604 

+ .605 

+ .451 
+ .578 



r between Teaching 
Ability as Deters 
mined by Super- 
visors and as De- 
termined by Pupils 

+ .875 
+ .656 

+ .682 

+ .730 
+ .631 
+ .738 

+ .806 

+ .743 
+ .743 



The fact that the correlation between groups of pupils' judg- 
ments was so high implies a real relation between those in whom 
the pupils have confidence and those in whom supervisors and 
fellow-teachers have confidence for their teaching ability. The 
weakness in these correlations is due to the large probable error, 
which is due to the small number (39) of teachers who are studied. 



THE SIGNIFICANCE OF INTERESTS 

The relation, if any, between success in teaching and interests 
in the various school subjects was determined by having the 
teachers fill out blanks which were constructed to reveal the rela- 
tive amount of interest that each teacher had in mathematics, 
history, literature, and science. 

An examination of the data clearly shows that teachers do not 
have distinct types of interests. The better, teachers showed a 
slight tendency to prefer what are usually considered the harder 
subjects. 



CHAPTER IV 

THE RELATIVE SIGNIFICANCE OF THE QUALITIES 

MEASURED 

We have obtained, in the case of six groups of teachers, a quan- 
titative rating for general teaching ability and we have correlated 
success in teaching with certain measurable facts about teachers. 
The number of cases involved was 153. In some of the correla- 
tions the total number of cases was not used. 

METHOD USED 

The process of obtaining the rating for general teaching ability 
has been explained and the process by which certain significant 
facts about the teachers were obtained has been explained. 

The Pearson formula for computing the coefficient of correla- 

Xx ' y 
tion : r = / — /— = — ^was used in all cases. In this formula z is 

the divergence from the central tendency in one distribution and 
y is the corresponding divergence in the other. The weighted 
average correlation was obtained by weighting the correlation of 
each group on the basis of the number of cases in that group. 
The formula which was used for attenuation follows: 



v^= 






In all cases reported the coefficients of unreliability of the cor- 
relations have been computed. The formula used for this was: 



Otr-obt.r — 



v; 



Where several correlations have been averaged, the weighting 
has been done on a basis of the number of cases. If, for example, 
the correlation between teaching ability and age was + . 444 for a 
group of 20 and + . 666 for a group of 40, the average correlation 
would be + . 592, counting the 20-group once, the 40-group twice, 
and then dividing by three. 
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THE MEANING OF COBBELATION 

The correlations vary in sise and in significance. The use of 
correlation in this connection is for diagnostic purposes. For 
example, if we knew that the correlation between alnlity to pass 
an intelligence test and ability to teach was + . 999, all we would 
have to know about a teacher would be her ability to pass an intel- 
ligence test in order to know how good a teacher she would be. 
Correlations of this sort do not exist. 

Teaching is not perfectly correlated with any one thing, except 
teaching ability. Perfect correlation between general teaching 
ability and any other single quality would mean for us complete 
identity between the two traits correlated, and teaching is not 
identicaUy like any one quality or ability which we can as yet 
measure. We cannot be sure just what qualities must be pos- 
sessed, or in what degrees, or in what combinations, for a teacher 
to be a successful teacher. We do know that more than one 
quality is needed. 

In all probability, we shall know at some time and with scien- 
tific precision why the good teacher is good and why the poor 
teacher is poor. We shall also possess at some time the means of 
securing a satisfactory measure. 

We are all certain that success in teaching does not ^'just hap- 
pen," but is due to the possession of certain traits in certain 
amounts and in many combinations. Some minimum essentials 
can be stated, but at present we are not certain as to how we can 
use the knowledge of minimum essentials which we now have. 

We know, of course, that a stark idiot could not teach; but, on 
the other hand, we do not know how much intelligence is the ideal 
amount for the elementary teacher to possess. It is not at all 
certain that unusual intellectual attainments in a first-grade 
teacher are, all things being considered, worth paying for. It 
has never been shown that a teacher with an intelligence quotient 
of 180 is a better teacher, because of that rating, than a teacher 
with an intelligence quotient of 120. It is well within reason to 
suppose that too much intelligence among those who do some 
kinds of teaching work is a handicap, just as in a corresponding 
degree too little intelligence is a handicap to other teachers. 

Similarly, a certain amount of health is a minimum essential 
for teaching, but it has never been shown that the healthiest 
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teachers are the best teachers. After a certain standard of health 
is reached, more health may not be effective in improving the 
V^m : quality of teaching. 

Fr- We may yet find that certain ratios between height and weight, 

sir certain ranges of body temperature, certain ranges of emotional 

■f ffli'f characteristics, certain qualities of vision and of eyesight, or cer- 

)i\y\m: tain speeds in time reaction, or certain flexibilities of memory, or 

3fJiff Ar. certain degrees of blood pressure, are present in good teachers 

and not in poor teachers. It could not, however, be stated as a 
ronefc: fundamental hypothesis that, after a certain degree of keenness 

geufli'a of vision is reached, still more keenness of vision will correlate 

^Imc with, or bring about, better teaching. 

' bfe Moreover, it is reasonable to assume that, to the extent that 

iweiKi we can determine relationships between effective teaching and 

sffloft:- objective, measurable facts, we shall advance toward skill in: 

j/o/iii the rating and prognosis of teaching ability. 

Suppose, for the moment, we found that the older a teacher 
becomes, the better teacher she also becomes. Suppose, on the- 
^ other hand, that the more poorly a teacher wrote, the more skill- 

u^v ful she was in governing pupils. Although some qualities are 

not constituents of teaching ability, as are intellect and faithful- 
ness, nevertheless they may still serve usefully as indices of 
teaching ability. 
*'^ If we could get enough measurable facts about a teacher and 

J, then correlate them with teaching ability, we should be able to 

g. rate teachers successfully. These measurable facts do exist and 

our problem is to discover them and correlate them. 

In reviewing the correlations which have been presented, the 
reader should keep in mind the simple interpretation of the mean- 
ing of correlation; namely, (1) that there is perfect correlation 
'\ between two observable series of facts, if the presence of one fact 

means the presence also of the other fact in the same way and in 
the same relative degree; (2) that there is perfect negative corre- 
lation, if the presence of one fact meant the absence of the other. 
For example, if we know that the older a teacher is, the poorer 
she is, then there will be perfect negative correlation between 
age and ability to teach. 

Zero correlation exists, if the relation between the two facts is 
such as would be produced by pure chance. Prediction is pos- 
sible, if the correlations are removed in size from zero. The 
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greater the size of the correlation, other things being equal, the 
more exact the prediction. 

The correlations between ability to teach and other sets of facts, 
which have been found in this study, after adjustments have been 
made in order that one correlation may best represent the facts, 
follow. 

Ck)rrelations between: 

Ability to Teach and Age + .082 

Ability to Teach and Salary +.348 

Ability to Teach and Experience + .041 

Ability to Teach and Intelligence as measured by test + . 164 

Ability to Teach and Handwriting + .000 

Ability to Teach and Knowledge of Teaching Technique as measured 

by professional test + .608 

Ability to Teach and Study while in service + .328 

Ability to Teach and Normalnschool Scholarship + . 147 

Ability to Teach and Pupils' Estimates of Teachers + .784 

Other significant correlations are: ^^ , 

Normal- 
School 
Test B Test C Standing 

Test A (First Mental Test) 812 .470 .559 

Test B (Second Mental Test) .584 .536 

Test C (Professional Test) .486 

General teaching ability and success in normal-school studies 
were correlated as follows: English, -I-.040; Arithmetic, -l-.OOl; 
Geography, -|-.370; Science, -h.268; History, -I-.235; Practice 
Teaching, -I-.057. 

Intellect, as measured by test, correlates with ability to pass 
a professional test in about the same degree as it does with normal- 
school standing, and ability to pass a professional test correlates 
a little lower with normal-school standing than does intellect, 
when the results of the two tests are pooled. 

Apparently the factor of intellect is quite significant in normal- 
school study, but, in comparison with other factors, it fades out 
in class-room work. 

Intellect is certainly operative in ability to pass a professional 
test, but it is uncertain whether the intellectual factors which 
operate in ability to pass a professional test are those which ac- 
count for the correlation between ability to teach and ability to 
pass a professional test, since the correlation between ability to 
teach and ability as revealed in psychological tests is itself so low. 
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THE MORE IMPORTANT TRAITS 

Some measurable facts do not appear to have prognostic value, 
while others do. We may now consider the interrelationships of 
four traits: general ability to teach; ability to pass a professional 
test; ability to pass a mental test; and standing in normal school or 
normal-school record. 

All of these traits or abilities are interrelated. There is some 
correlation between a teacher's standing in normal school and 
her subsequent ability to teach, her ability to pass a professional 
test, her ability to pass a mental test. Some relation exists be- 
tween each trait and every other trait. The amount of positive 
relationship between any two traits which appears in a simple 
correlation is affected by the influence of the other traits. The 
interrelationships are exceedingly complex. 

The problem may be analyzed as follows: 

Let Q represent general teaching ability 
Let I represent intellect as measured by test 
Let P represent ability to pass a professional test 
Let iV represent normalHschool record 

Q is related to I 

is related to P 
G is related to N 

1 is related to P 
I is related to N 
P is related to N 

The GI relationship is related to or affected by P 
The GI relationship is related to or affected by N 
The GI relationship is related to or affected by PN 
The GP relationship is influenced by I and by N 
The GN relationship is influenced by I and by P 
The IP relationship is influenced by G and by N 
The GN relationship is influenced by / and by P 
The PN relationship is influenced by G and by I 

The mutual relationship between ability to teach and ability 
to make scores in mental tests will be affected in some measure 
by one's standing in normal school, because one's ability to teach 
is affected by what one did in normal school and one's standing 
in normal school is also, more or less, a result of intellectuid 
ability. 

By a statistical procedure of partial correlation the true rela- 
tion may be found in the following cases: 
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G and /, when factors P and N are neutralized or non-operative 
G and P, when factors / and N are neutralized or non-operative 
G and N, when factors I and P are neutralized or non-operative 

To make partial correlations we must have measures in all 
traits for each person. It is exceedingly difficult to get measure 
for all traits for many cases. We have, however, satisfactory 
measures of teaching ability, ability to pass a professional test, 
normal-school record, and a measure of intellectual keenness for 
29 elementary-school teachers. 

This is too small a number on which to base any sweeping 
conclusion. The method which has been used is the correct one, 
however, and is best adapted to find out what relationships exist 
between teaching ability and certain measurable traits. More 
cases, other studies, further investigations, must be made before 
the question can be finally answered. 

These total correlations were discovered: 

General Teaching Ability and Intellectual Keenness d: . 000 

General Teaching Ability and Ability to Pass a Professional Test . . . 4- -541 

General Teaching Ability and Normal-school Standing + . 153 

Intellectual Keenness and Ability to Pass a Professional Test 4- . 108 

Intellectual Keenness and Normal-school Standing + -371 

Ability to Pass a Professional Test and Normal-school Stranding .... + -560 

These partial correlations were discovered: 

General Teaching Ability and Intellectual Keenness -I-.08S 

General Teaching Ability and Normal-school Standing —.214 

General Teaching Ability and Ability to Pass a Professional Test. . . + .570 

From the partial correlations these deductions seem justifiable: 

1. The differences in mental keenness which are revealed in the 
passing of psychological tests do not correspond with differences 
in teaching success. 

2. The position that a student in normal school holds in her 
class is not indicative of her subsequent success as a teacher. 

3. The relative success achieved in passing a professional test 
is correlated positively and highly with success in teaching. 

4. Matters of such importance as we have been studying can- 
not be settled without similar investigation of many more cases, 
although in the present study the correct statistical method has 
been followed. 

This study indicates that age, experience, quality of hand- 
writing, intelligence as measured by tests, normal-school stand- 
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ing, or the expressed interests of teachers are not closely related 
to success in teaching. 

We have, however, an indication of a mutual relationship be- 
tween teaching and a knowledge of the technique of teaching 
which challenges attention. Everyone must interpret this fact 
in the light of his own experience and judgment, imtil further 
data have been analyzed. 

If the ability to pass a professional test were an index of teach- 
ing ability, because the teacher who teaches a long time learns 
how to teach and also how to pass a test,^that is, if experience 
or age were the real sine qua non of good teaching, — ^then that 
fact would have appeared in our correlations between age 
and experience with general teaching ability. It did not so 
appear. 

Professional preparation, as indicated by normal-school stand- 
ing, does not appear to account for the +.570 correlation between 
teaching success and knowledge of technique, because normal- 
school standing, when correlated directly with teaching ability 
(50 cases), correlated only +.333. In the partial correlation the 
relation was even slightly negative. 

Relatively large amounts of pure intellectual alertness are not 
uniformly possessed by good teachers, while poor teachers uni- 
formly lack intellectual alertness. For, with 100 cases, the corre- 
lation between success in teaching and intellect, as measured by 
test, was very low, and in the partial correlation a zero relation- 
ship appeared. 

The most reasonable explanation seems to be along the line 
of a teacher's interest in her work. No other explanation is 
apparent nor is any other perhaps needed. The teacher who has a 
genuine interest in her profession will learn its technique and 
hence will pass well in a professional test. Those who have not a 
real devotion to their art will forget, or never take the trouble to 
master, the technique of their work. When, therefore, a test 
which requires technical knowledge is given to them without warn- 
ing, they will fail. 

Teaching, especially in the grades, will be well done by those 
who are sensitive to its problems and thoughtful of their solution. 
Interest of a substantial vital kind will explain the mutual rela- 
tionship between ability to pass a professional test and success in 
actual teaching. Moreover, the ability to pass a professional 
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test may be taken as an index of real interest for, and of probable 
success in, teaching. 

If it were within the possibilities of any one study to procure 
enough cases upon which to base final conclusions, we could take 
the partial correlations between general teaching ability and 
several measurable factors and, by combining in a regression 
equation, say that teaching is a composite of measurable factor 
a, taken x times; factor b, taken y times; factor o, taken z times; 
factor d, taken n times. 

Then, in order to rate teachers or to select them, the proper 
procedure would be to procure measures and combine them into 
a final rating. This has not been done and will not be done, imtil 
some provision is made for a competent investigator with a staff 
of statisticians at his command to have access to at least 500 
elementary-school teachers, distributed in several types of 
school systems, and studied for a period of years. Until this 
situation is possible, proper checking of results is impossible. 
The method which has been used in this study would be in the 
main a satisfactory procedure for such an elaborate study. 



CHAPTER V 

THE RELATION BETWEEN SPECIFIC TRAITS WHEN 
SEVERAL JUDGES RATE THE SAME TEACHERS 
IN THOSE TRAITS 

THE THEORY OF ANALYSIS 

Teaching as a whole may be analyzed, for purposes of con- 
venience, into constituent parts, such as ability to ask questions; 
ability to direct study; ability to govern; ability to stimulate the 
moral health of the community; and kindred abilities. Theo- 
retically, such an analysis is possible. How much analysis of this 
kind is real, however, when the analysis is made on a basis of 
personal judgment is uncertain. 

In the first part of this study emphasis was placed on what 
many judges thought about a teacher in general. This was 
taken as an adequate basis of merit. Would it not be better to 
use analyzed judgments? 

School administrators have been using of late a score card which 
contains many traits of teaching. This score card is used as a 
basis, or as a method, or as a help, in aiding administrators to 
form their judgments concerning teachers. So much has been 
written on the subject of score-card rating, and students of educa- 
tional theory and practice are already so sufficiently informed 
concerning the score-card method of rating teaching, that a 
further review of the literature, other than the brief discussion in 
the Introduction of this study, is redundant. 

Teaching, in a certain sense, is an organic unity, but, in a very 
useful sense, it is also a composite of faculties, or traits, or func- 
tions, all of which are more or less disparate and separable. 
Accepting teaching in the latter meaning, we get genuine insight 
into the troublesome problems of estimating teaching ability and 
in rating teachers, if we list the constituents of teaching ability, 
assign a value to each ability, measure the amount in which each 
ability exists in any given teacher, and thus compute a final rating 
for teachers. The various score cards which have been devised 
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for rating teachers attempt to solve, in various degrees of com- 
pleteness, just this sort of problem. The following data throw 
light on what actually happens, when analysis by personal judg- 
ments is attempted. 

DATA ON ANALYSIS 

The best way to present the data is to give a running account 
of how they were obtained. When the teachers in Towns A, B, 
and C rated each other for general teaching ability, they also 
rated each other for specific qualities. These qualities are those 
which could well be considered as significant and analyzed 
qualities of teaching ability. 

The complete instructions which were given to the teachers 
and the qualities which were to be rated will be seen from an 
inspection of the original instructions which are here reproduced. 

INSTBUCTIONS GIVEN TO TEACHERS 

On this sheet you are requested to give certain ratings of each teacher in 
the list, including yourself. Please rate eveiy teacher, and please be absolutely 
frank in yoiu: ratings. You need not sign your name. Nobody will ever know 
how you or anybody else rated him. No personal use will ever be made of any 
of these ratings. They will be used in a purely scientific study to determine 
the significance of age, education, early interests, etc., etc., for success as a 
teacher. The names will all be cut off and destroyed as soon as the different 
items in the inquiry have been numbered to fit the ones to whom they refer. 
Also, do not feel distiurbed because in each respect somebody has to be rated 
lowest. These ratings are all relative, and the lowest teacher in the group may 
well be of very great ability. Please be sure to record ratings even if they 
seem to you to be little better than mere guesses. The opinions of twenty men 
will give a useful rating, even if any one of the twenty taken alone is almost 
worthless. 

On the sheet is a list of the teachers. Choose the teacher of greatest teaching 
ability m the group and write a figure 1 after his or her name in column 1. 
Choose the teacher next below in teaching ability and write 2 after his or her 
name in colunm 1. Write 3 after the name of the one next in teaching ability 
and so on with 4, 5, 6, etc. If two or more seem absolutely equal in teaching 
ability give them the same rating. 

Then think of the ability to underaUmd and manage peop^, to get on with 
other men, to secure obedience from inferiors, cooperation from equals, and 
consent and support from superiors in school, business, or other activities. 
Choose the teacher of greatest ability in the group and write 1 after his or her 
name in column 2. Proceed as for teaching ability ranking. 

Then think of intellectual ability y the ability to manage ideas, to work with 
facts, rules, and principles, to learn the science of a thing, to understand 
explanations and reasons, to think things out. Choose the teacher of greatest 
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intellectual ability in the group and write 1 after his or her name in column 3. 
Proceed as for teaching ability rating. 

Then think of the abihty to manage things and mechanismsj the ability to sail 
a boat, to drive a motor car, to use tools, machines, and instruments of all sorts, 
to be handy. Choose the teacher of greatest ability to manage things and 
mechanisms in the group and write 1 after his or her name in column 4. Pro- 
ceed as for teaching ability rating. 

Then think of general scholarship, signs of education, knowledge of literature, 
etc. Choose the teacher of greatest general scholarship in the group and write 
1 after his or her name in column 5. Proceed as for teaching ability ranking. 

Then think of skill in government or disciplinef ability to control, to keep 
order, etc. Choose the teacher of greatest skill in government or discipline 
in the group and write 1 after his or her name in colimm 6. Proceed as for 
teacher rating ability. 

Then think of instructional skiU, pure ability to instruct, correct and effective 
methods, economy of time and effort, ability to get all pupils to understand 
the subject-matter. Choose the teacher of greatest ability in instructional 
skill in the group and write 1 after his or her name in column 7. Proceed as 
for teaching ability rating. 

Then think of initiativef the making of headway, the starting of new means, 
the stating of new ends. Choose the teacher with the greatest initiative in the 
group and write 1 after his or her name in column 8. Proceed as for teaching 
ability rating. 

Then think of nervous and physical strength. Choose the teacher of greatest 
nervous and physical strength in the group and write 1 after his or her name in 
column 9. Proceed as for teaching ability rating. 

Then think of that teacher who commands the greatest respect of the pupils 
in the group and write 1 after his or her name in column 10. Proceed as for 
teaching ability rating. 

Finally, think of general ability to get resvUs, Choose the teacher with the 
greatest ability to get results in the group and write 1 after his or her name 
in column 11. Proceed as for teaching ability rating. 

The actual ratings for eleven traits made on a prepared sheet 
(see illustration on next page) were secured for the 156 teachers 
in exactly the same way as the ratings for general teaching 
ability were secured. 

The ratings for general intellectual ability and for skill in dis- 
cipline were then treated as were the estimates of general teaching 
ability. For the six groups of teachers a relative rating for the 
qualities, general intellectual ability and skill in discipline, were 
obtained. These were turned into quantitative ratings as in the 
case for general teaching ability. As the statistical process was 
the same as that which was described in Chapter II, under the 
heading "Process of Rating Teachers," a description of the pro- 
cedure need not be here repeated. 
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EXPLANATION OF GOLT7MNS 

Col. 1. General ability as a teacher. 

Col. 2. General ability to manage people. 

Col. 3. General intellectual ability. 

Col. 4. Ability to manage things and mechanism. 

Col. 5. General scholarship. 

Col. 6. Skill in discipline. 

Col. 7. Ability to instruct. 

Col. 8. Initiative. 

Col. 9. Nervous and physical strength. 

Col. 10. Respect of pupils. 

Col. 11. General ability to get results. 

In all iustances half of the ratings were selected by chance and 
were treated as the data to form one rating, and the other half 
of the ratings were used to form another rating. Thus ratings for 
those two qualities for the six groups could be checked. 



AGREEMENT BETWEEN TWO GROUPS OP JUDGES WHO JUDGE THE 
SAME TEACHERS FOR THE QUALITY GENERAL INTELLECTUAL 
ABILITY 

The correlations between the halves of the judgments for the 
qualities, general intellectual ability and skill in discipline were 
then computed. One half of the judgments we shall call Group 
A and the other, Group B. The reader will recall that there was, 
in the case of mutual judgments for general teaching ability, a 
high correlation between the two chance halves of the judgments 
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in the case of all six groups. We have the same condition pre- 
vailing here. 
These correlations are of interest: 

r between Two 
Groups of Judges 
Who Judge the 
Same Teachers 
for General Intel- 
lectual Ability No. of Cases 

rj, . f Grade teachers -f. 861, ±.036 63 

°^^ I ffigh-school teachers +.967, ±.016 16 

Town R / Grade teachers +.899, ± .031 36 

1 High-schoolteachers +.846, d=. 079 13 

rp p f Grade teachers +.968, ±.014 30 

^^^^'^\ffigh.^hool teachers +.326, ±.279 10 

Average (weighted for number in each 
group) +.879, ±.016 

These correlations show that there is close agreement among 
the teachers as to the distribution of general intellectual ability 
among them. In this case when two groups of judges estimate 
the differences of intellectual capacity of a corps of teachers the 
mutual agreement is on the average +.879=*= .01. This is a 
weighted average of estimates of six different corps of teachers. 

These correlations are also of interest: 

AGBEBMBNT BETWEEN TWO SETS OF JT7DGES IN RATING A GROUP OF TEACHERS 

FOR THE TRAIT SKILL IN DI8CIPLINB 

r between Two 
Groups of Judges 
Who Judge the 
Same Teachers 
for SkiU in Dis- 
cipline 

rp * f Grade teachers -f .943, d= .015 

lown A j High-school teachers +.896, =fc.060 

rp g f Grade teachers +.767, ±.071 

1 High-school teachers + .581, dt . 180 

rp p / Grade teachers +.728, ±.086 

lown o I Higij.a^5h^i teachers +.917, =fc.049 

Average (weighted for size of group) + .838, ± .023 

With skill in discipline as with general teaching ability and 
general intellectual ability, we find that the size of the correlation 
indicates substantial agreement among the judges, which is very 
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far from a matter of chance that guesses or haphazard opinions 
would have produced. 

This agreement between chance halves of judgments for the 
qualities, intellectual ability and skill in discipUne, does not prove 
that intellectual ability, as such, or skill in discipline, as such, 
were the traits actually rated, although an easy interpretation of 
the data might lead one to think so. This agreement simply 
means that on the whole the judges had the same quality or trait 
in mind and really did agree as to the distribution of amounts of 
the qualities or traits. 

What agreement there would have been concerning other traits 
we do not know. It is fair to assume, however, that equally high 
agreement exists. The three traits which we treated statistically 
show uniformly high agreement and the enormous amoimt of 
time required to work out other ratings seems unnecessary, when 
the first three treated show the amount of agreement that is 
present. 

The important fact is that when teachers rate each other for 
general teaching ability, or for a specific quality, such as skill in 
discipline, chance halves of the ratings mutually correlate so 
highly that substantial agreement is fairly established. The 
average correlation between chance halves of judges when judging 
the same group of teachers for the same qualities is +.872. 
This calculation is based on the average of eighteen sets 
of judgments. 

THE ABSENCE OF ANALYSIS IN RATING 

To find out how much actual analysis is made when judgments 
for specific traits are recorded, we shall correlate the ratings for 
general teaching abiUty with general intellectual ability; general 
teaching ability with skill in discipline; general intellectual ability 
with skill in discipUne; and then interpret the correlations which 
have thus been obtained. 

What relation is there between ability to teach and intellectual 
ability when both traits are judged by mutual ratings? As we 
have here two independent measures for each trait, we can correct 
for attenuation and get a reUable finding. The independent 
measures are, of course, the two ratings of the two chance groups, 
"A" and "B" mentioned before. 



The Relation Between Specific Traits 61 

THE CORRELATIONS BETWEEN (l) GENERAL TEACHING ABILITY (ll) 
GENERAL INTELLECTUAL ABILITY WHEN THE SAME JUDGES 
JUDGE THE SAME TEACHERS FOR TWO TRAITS FOLLOW 

r between 
Traits 1 and 
2, Corrected 
r between I A r between II A for Attenua- 
and II B^ and I B> tion 

-, . r Grade teachers +.927 * .019 +.802 * .049 +.957 * .011 

^'^^^l High-school teachers.... +.925*. 037 +.822*. 083 +.937*. 030 

-, ^ f Grade teachers +.899*. 032 +.919*. 026 +1.000+« 

^^^1 High-schoolteachers.... +.919*. 043 +.791*. 103 +.925*. 041 

- p f Grade teachers +.461*. 143 +.859*. 047 +.713*.08» 

'^X High-school teachers.... +.944*.034 +.260*. 294 +1.000+» 

Average (weighted for sise of group) +.847*. 018 +.819*. 026 +.935*. 014 

> I A and II B is read the correlation between teaching ability as rated by one group of 
Judges and intellectual capacity as rated by another group of judges. 

* II A and I B is read the same, except that the groups of judges estimate the traits ixir 
reverse order. 

* Tlie two correlations above +1.0 of course are wrong in the sense that we could have 
more than perfect correspondence. 

The correlation between general teaching ability and general 
intellectual ability, when weighted for size of groups, is +.936 
=*= .014. On first glance, it would seem as if there were an astound- 
ingly high mutual relationship between ability to teach and gen- 
eral intellectual ability. 

The correlations, however, should be given more than passing; 
notice. First, let us go back to the eleven traits on the original 
rating sheets. In a sense, these original rating sheets might be 
considered score cards extended from one person judged and one 
person rating to many persons judged and many persons scoring. 
We might further think that general teaching abiUty is a compos- 
ite of the other ten traits which have been mentioned. At least 
these are among the more important traits mentioned on score 
cards in general. Our correlation of +.935^.014 between gen- 
eral teaching ability and general intellectual ability could be 
variously interpreted. It is exceedingly important that the- 
interpretation should be correct. 

First, we might conclude that the judges kept general teaching, 
ability and general intellectual ability clearly distinct from each, 
other in their minds when they were rating and that they actually 
found that there was this high mutual relationship betweea 
intellect and pedagogic skill. From this reasoning it might fairly 
be held that the stronger a teacher was the abler she would bes 
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mentally, and, conversely, that mental vigor implies a corre- 
sponding degree of teaching power. 

Second, on the other hand, this correlation of +.935±.014 
may be interpreted as an inability on the part of the judges 
to distinguish effectively between teaching strength and intellec- 
tual capacity in persons judged by them. In otherwords, a judge 
has a certain opinion of a teacher in toto, and his opinion is given 
according to his general impression in answer to any significant 
question about that teacher. Thus, the general estimate may be 
taken to permeate all particular judgments, and, conversely, 
particular judgments are simply defenses for, or justifications of, 
the general opinion which has thus been held. 

To make this still clearer, let us assume that a person likes a 
certain picture. If this like is strong enough, it will not vary 
from whatever point of view the picture may appear. Let it 
stand on the right of the person; he will still like it. Let him see 
the picture from the left; he will still like it. The total effect 
being pleasing, it will not be hard so to rationalize his thinking 
that the background, the middle, and the foreground will all 
appe9.r to be well painted. The detail will be correct or over- 
looked, and the main features will be good or easily condoned. 
We can very well term this process the spreading of a halo of 
general effect to all particular parts. 

So it might well be in judging a teacher. Looked at from the 
right or the left, from the aspect of intellect or from that of gen- 
eral ability to teach, the general opinion will still be present and 
will be the basis upon which the judgment is formed. This is 
apparently the most reasonable interpretation of the correlation. 
In many of our school practices we have assumed for ourselves 
the ability to analyze an organic whole and an ability to judge the 
parts of a person, irrespective of the whole; but, when we actually 
check up our mental processes, we see that this ability, if it exists 
at all, exists in a very small degree. 

It appears that this spread of the general estimate enters into 
our particular judgments to a degree little before expected. For 
it is to be doubted if anyone would seriously hold that there was 
this correlation of +.935^.014 which really existed between 
general teaching ability and general intellectual ability. 

The reader will remember that in about 100 cases we deter- 
mined intellectual differences by means of standardized tests. 
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The correlation between general teaching ability and intellect, as 
measured by tests, was extremely low (+.164 was the average). 
Either the tests are not measures of intellect at all and hence the 
correlation +.164 is false, or the jvdgments of intellect include so 
many other qualities that they really are not judgments of intel- 
lect at all and the +.935 correlation is false. 

It should be remembered that teachers are already a highly 
selected group. There could hardly be any correlation of +.936 
between any two traits which were not practically identical. 
Since we know that the tests which were used were more than 
indifferent tests of what goes by the name of intellect, we are 
fairly correct in our conclusion that the correlation of +.936 
between general teaching abiUty and general intellectual ability 
as estimated by judgments shows not an estimate of intellect to 
have been made, but rather an estimate of general ability under 
the name of intellect. The analysis in these instances simply was 
not made! 



THE CORRELATION BETWEEN ABILITY TO TEACH AND SKILL IN 

DISCIPLINE 

We have two separate measures of general teaching ability and 

two separate measures of skill in discipline. Correlation between 

the "A" group of judges' estimates for general teaching ability 

and the ''B*' group of judges' estimates for skill in discipUne are 

recorded under the Caption I A and VI B. The correlations under 

the caption I B and VI A are the correlations between the "B" 

group of judges' estimates of general teaching ability and the 

"A" group of judges' estimates for skill in discipline. The third 

column gives the correlations which have been corrected for 

attenuation. 

r between 
Trait 1 and 
Trait 6, Cor- 
rected for 
I A and VI B I B and VI A Attenuation 

r Grade teachers +.776*. 055 +.712*. 068 +.787*. 052 

^^''^^iHigh-Bchool teachers.... +.650*. 149 +.829*. 080 +.789*. 094 

/ Grade teachers +.686*. 089 +.580*. 112 +.699*. 085 

^^'^^l High-school teachers.... +.703*. 140 +.767*. 081 1.000+ » 

p / Grade teachers +.679*. 098 +.696*. 094 +.824*. 058 

^®^^^\ High-school teachers.... +.900*. 060 +.625*. 192 +.964*.022 

Average (weighted for siie of group) +.741*.036 +.703*. 040 +.789*. 001 

^The two oorrelations above +1.0 of course are wrong in the sense that we could have 
more than perfect correspondence. 
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The correlation between general teaching ability and skOl in 
discipline, when weighted for size of groups, is +.789=^.001. 
Here again we find a higher correlation than we would ordinarily 
expect. It is not higher than that between general teaching 
ability and general intellectual ability, although we would cer- 
tainly hold that it should be. This is accounted for by the fact 
that disciplinary skill can be better judged than intellect, and, 
therefore, the tendency to spread a judgment might be lessened; 
but there is more of the explanation in the fact that discipline was 
the sixth trait that was rated. By the time that the sixth column 
is reached there is a fairly definite temptation to vary ratings as 
a matter of principle or as a device to relieve monotony, or simply 
because one wants to. 

It is fair to assume that, if discipline had been the second rather 
than the sixth trait to be rated, the correlation would have been 
higher. By how much is, of course, uncertain. As far as tradi- 
tion goes the correlation should have been higher between general 
teaching ability and skill in discipline than between general teach- 
ing merit and intellectual strength. For it is everywhere assimied 
that in public-school teaching, skill in discipline is the first requi- 
site. The fact that 153 teachers in groups rating each other found 
a higher mutual relationship between general teaching ability and 
general intellectual ability than between general teaching ability 
and skill in discipline is, to say the least, interesting. It is also 
hard to account for, except by the fact that judgments of particu- 
lar traits are really defenses of general estimate rather than esti- 
mates of particular traits which have been considered in isolation. 
Of course, governing skill is a constituent of good teaching, 
but that the true correlation is as high as +.787 is to be much 
doubted. If it were true, it should mean that the drill sergeant 
would be the best teacher. It would also imply that mere order- 
keeping was a larger part of instruction than we believe it to be. 
The factor of spread of general opinion is also present here. 

The correlation between general intellectual ability and skill in 
discipline, when weighted for size of groups, is +.719 and when 
corrected for attenuation is +.863,^.020. 

The correlation between Trait II and Trait VI, when corrected 
for attenuation, gives the final correlation between general intel- 
lectual ability and skill in discipline. This correlation instead of 
reveaUng the fact of the case is, if taken at its face value, nothing 
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^ tescl^i:^ short of preposterous. Were this really the truth, what a prodigy 

^ o/|TC{K:- of intellect the "strict," but often dull, teacher would be! If we 

urn rk r:M. thus generalized, we would also hold that Grant, admittedly a past 

Jt beiTmi^i master in control, also towered above Lincoln in mental stature. 

is Sicco'C^:': 
judged ±: 
idpneniif 



TBB CORRELATION BBTWEEN GENERAL INTELLECTUAL ABILITT AND SKILL 

IN DIBCIPLINB 

r between 
Trait II and 
Trait VI, Cor- 

tbe !sci tii'i rected for At- 

II A and VI B^ II Band VI A* tenuation 

Grade teaohera +.700^.070 +.800 ah. 049 +.941 



timetk'sic 



)tSitiont/ir ^'^^^l High-schoolteachers.... +.525*. 187 +.805*. 090 +.698 

,,.^. - -. f Grade teachers +.766*. 072 +.609*. 106 +.824 

eveiflmjc ^^'^"l High-schoolteachers.... +.688*. 183 +.789*. 104 +.968 

-, p f Grade teachers +.915*. 029 +.663*. 102 +.932 

ih»niks: '^l High-school teachers.... +.024*. 316 +.760*. 297 +.245 

' Average (weighted for sise of group) +.697*. 042 +.741*. 036 +.863 

. , iJI A and VI B is read: the correlation between Group A judges' estimate of general in- 

tM. J^*' tellectual ability with Group B judgments of skill in discipline, when both groups of judges are 

jVilP/ f^f^'T^^- judging the same teacher. 

^ s II B and VI A is similarly interpreted. 



ueistki:^ 






rood t^> 



THE INFLUENCE OP GENERAL ESTIMATE 



The factor of spread of general opinion to particular traits is 
here well illustrated. We must remember that these teachers 
are a relatively selected group for general intellectual ability. 
All are graduates of normal school or college, and this would tend 
to lower the correlation. Of course, there is some correlation 
between general intellectual ability and skill in discipline. A 
stark fool could not control a class, but common sense would 
prohibit us from believing that any such mutual relationship as a 
correlation of +.863 suggests is the actual fact. 

We would also deny, irrespective of these or any other data 

likely to be presented, that there was no closer relationship 

^^. , between teaching ability and discipline than between intellect 

and discipline. And yet our findings, if interpreted literally, 
show this. 

This factor of spread of general estimate can be illustrated 
in another way. Allow, for the purpose of the illustration, that 
the supervisors' estimates of general teaching merit adequately 
represent the facts. If we correlate what the teachers rated as 
intellect with what the supervisors rated as general ability, we 
get a valuable evidence that teachers rate teaching ability even 

5 
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when they are asked to rate general intellectual ability. The 
same thing can be done for teachers' estimates for skill in disci- 
pline and supervisors' ratings for general teaching ability. These 
correlations are as follows: 

r between Super- 
r between Super- visors' Estimate 
visors' Estimate of of General Teach- 
Genend Teaching ing Ability and 
Ability and General General Intellec- 
Intellectual Ability tual Ability as Es- 
as Estimated by timated by Group 
Group A Teachers B Teachers 

-, . f Grade teachers +.883 (62 cases) +.869 

°^^ 1 High-school teachers +.840 (16 cases) +.886 

-, ^ f Grade teachers +.946 (36 cases) +.971 

^^^^ ^ 1 High^chool teachers +.611 (13 cases) +.760 

rp ^ f Grade teachers +.768 (30 cases) +.741 

^^^ \ High-school teachers + .477 (10 cases) + .999 

r between Super- 
r between Super- visors' Estimate 
visors' Estimate of of General Teach- 
General Teaching ing Ability and 
Ability and Skill in Skill in Disci- 
Discipline as Esti- pline as Esti- 
mated by Group A mated by Group 
Teachers B Teachers 

-, . f Grade teachers +.786 +.680 

lown A < High-school teachers +.460 +.708 

-, -^ f Grade teachers +.679 +.729 

lown ts j High-school teachers + .662 +.847 

- p f Grade teachers +.769 +.966 

^^^ '^ \ High-school teachers +.890 +.772 

The average correlation, weighting for size of group judgments, 
between supervisors' estimates of general teaching ability and 
mutual judgments of the teachers for general intellectual ability 
is +.876; between supervisors' estimates of general teaching 
ability and mutual judgments of the teachers for skill in disci- 
pline, +.744. We could not hold that any such relation really 
held between general teaching ability and either general intellec- 
tual ability or skill in discipline. These correlations are another 
and sufficient evidence of the fact that in analyzed judgments the 
factor of the spread of the general estimate is present in a most 
vicious form. 
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The factor of spread is shown by these data: 

OOBBBLATIONS, GOBBECTBD FOB ATTENUATION, BBTWXEN GENBBAL TBACHINQ 
ABILITT AND GENEBAL INTBLLECTUAL ABILITTy WHEN BOTH ABB JUDGED 
BY GBOUPS OF TEACHBB8 

,p . f Grade teachers -f .957 

SI High-school teachers + .937 

Grade teachers +1 .000 

HighHSchool teachers. + .925 

Grade teachers -f .713 

High-school teachers +1 .000 

Average (weighted for size of groups) -(- .935 =1= .014 

OOBBELATIONS BETWEEN GENBBAL TEACHING ABILITT AND SKILL IN DISCIPLINB 
WHEN BOTH ABE JUDGED BY GB0X7PS OF TEACHEBS 

-, . r Grade teachers. . < -h .787 

1 High-school teachers + .789 

Grade teachers -(- .698 



TownB^ 
TownC 



^ High-school teachers -(-1 .000 

f Grade teachers -h .824 

\ High-school teachers 4- .964 

Average <weighted for size of groups) -(- . 789 dt . 041 



OOBBELATIONS BETWEEN SKILL IN DISCIPLINE AND GENEBAL INTELLECTUAL 
ABILITY, WHEN BOTH ABE JUDGED BY GBOXTPS OF TEACHEBS 

-, ./ Grade teachers + .941 

1 High-school teachers + .698 

rp T» / Grade teachers + .824 

1 High-school teachers + .968 

rp p I Grade teachers + .932 

\ High-school teachers + .245 

Average (weighted for size of groups) + .863 dt. 025 

COBBELATION BETWEEN ABILITY TO TEACH, AS JUDGED BY SUPEBVISOBS, AND 
GENEBAL INTELLECTUAL ABILITY, AS JUDGED BY GBOUPS OF TEACHEBS 

Average + .876 

COBBBLATION BETWEEN ABILITY TO TEACH, AS JUDGED BY SUPEBVISOBS, AND 
SKILL IN DISCIPLINE, AS JUDGED BY GBOXTPS OF TEACHEBS 

Average + .744 

It would seem that, when estimates are made of specific traits 
and such high correlations are obtained between the traits, a 
damaging factor of spread of general estimate must be allowed 
as a fact. 
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Some conclusions follow: 

First, that teachers, when rating each other for specific quali- 
ties, such as intellect or skill in discipline, agree in their estimates. 
This is shown by correlating the ratings of the same teachers for 
the same trait. These correlations average +.858, 

Second, that when ratings are made for specific qualities, a 
correlation between these ratings and those for general teaching 
ability is so high that a very great spread of the general estimate 
is present in the judgments for particular qualities. 

Third, this factor of the halo of general estimate, being present 
in particular judgments, is further shown by correlating the 
ratings for general teaching ability as they are given by the super- 
visors with ratings for particular qualities obtained by group 
judgments. Some average correlations are given below: 

General teaching ability, as obtained by group judgments, 
with general intellectual ability, similarly obtained + . 935 zk . 014 

General teaching ability, as obtained by group judgments, 
with skill in discipline, similarly obtained + .989 =1: .001 

General intellectual ability, as obtained by group judg- 
ments, and skill in discipline, similarly obtained + .863 =k .020 

General teaching ability, as obtained by supervisors' esti- 
mates, with general intellectual ability, as obtained by 
group judgments + .876 d= .020 

General teaching ability, as .obtained by supervisors' esti- 
mates, with skill in discipline + . 744 =k . 091 

Fourth, when analysis is attempted, analysis is not obtained, 
but ratings are obtained and these ratings are vitally influenced 
by the general estimate. 

It might be urged that this factor of spread of general estimate 
was greatly stimulated by the method of scoring, by the nature 
of instructions which were given, or by other reasons. 

PAILUKE OF ATTEMPTS AT ANALYSIS 

To check up the factor of spread in other circumstances, the 
analyzed ratings have been obtained of 129 teachers in a New 
York school system. Here. $t regular Boyce score card was used 
and the teachers were rated by their superintendent, by their 
respective principals, and by their supervisors. Each teacher 
was rated for forty-five distinct qualities. A list of these quali- 
ties is found on page 64. 

Some of the usual errors in rating were present. One error 



The Relation Between Specific Traits 59 

that is worth mentioning is that, although the instructions specifi- 
cally pointed out that good means above the average, the distribu- 
tion of ratings were in part skewed somewhat sharply from a 
normal distribution. 

Within any group of sufficient size it may be assumed for statis- 
tical purposes that the following distribution will satisfactorily 
represent the facts: 10 per cent very poor; 20 per cent poor; 
40 per cent medium or average; 20 per cent good; and 10 per 
cent excellent. 

The distributions of the ratings, which were given, follow: 

General Teaching Ability: 

No. Per Cent Per Cent 

Very poor O.Onormally should be 10 

Poor 5 3.9 normally should be 20 

Medium 34 27.8 normally should be 40 

Good 71 55.0 normally should be 20 

Excellent 19 14.7normally should be 10 

Skill in Discipline: 

No. Per Cent Per Cent 

Very poor.... 1 0.7normally should be 10 

Poor 4 3.0normally should be 20 

Medium 27 20.9 normally should be 40 

Good 61 47.2 normally should be 20 

Excellent 36 27.9 normally should be 10 

General Intellectual Ability: 

No. Per Cent Per Cent 

Very poor 0.0 normally should be 10 

Poor O.Onormally should be 20 

Medium 32 24.8 normally should be 40 

Good 80 62.0 normally should be 20 

Excellent 17 13. 1 normally should be 10 

The distributions for the ratings in voice are similarly massed 
and are also high. The distributions for the other traits, for 
which ratings were made, have not been worked out. The extent 
of the mismarking can be seen in the case of general intellectual 
abiUty. 

Assuming the least possible error and assuming the approxi- 
mate truth that a normal distribution of mental strength will be 
present in a group as lai^e as that which has been her6 con- 
sidered, both of which assumptions may fairly be made, we found 
that only 17 per cent of the teachers received a proper rating. 



60 Qualities Related to Success in Teaching 

We also found 16 per cent of the teachers were rated two steps 
too high. The remainder were misrated by one step. 

All were "rated up." The same fault thus appeared in score- 
card rating as in general-estimate rating. This skewing of the 
distributions is of importance to us, not because it shows that 
actual ratings hardly correspond with the probable facts, but 
because it reduces the number of groups or the spread of the dis- 
tribution. When correlations between traits are computed from 
data so greatly restricted in range, the correlations are lowered 
considerably not because a low correlation is the ultimate fact, 
but because the lack of spread in the distribution reduces the cor- 
relation mathematically. This rather technical consideration 
need not unduly concern us, for the factor of spread of judgment 
may be shown in quite another way, through correlations between 
qualities so large that only undue spread of judgment can ac- 
count for them. 

Eight correlations of traits follow: 

General teaching ability with general intellectual ability ... + . 677, dt . 03 

General teaching ability with skill in discipline + .787, =1= .02 

General teaching ability with voice + .632, d= .04 

General intellectual ability with voice +.625, d=.04 

General intellectual ability with skill in discipline + . 560, dz . 04 

Voice with interest in community + .600, d: . 04 

Voice with skill in discipline +.438, =k.06 

Skill in discipline with morals * + . 333, d= . 11 

These ratings were made, of course, entirely independent of 
this study and under circumstances calling for unusual care and 
thoroughness. 

Common sense would tell us that the correlation between 
voice — defined on the score card as '''voice — pitch, quality, clear- 
ness of school-room voice" — and interest in community is probably 
zero, but here it was found to be +.500, while voice and discipline 
was +.438, and general intellectual capacity and voice was +.626. 
The sizes of the correlations do not correspond to the importance 
of the relationships. 

These data are worthy of more extended treatment than the 
correlations on page 61 would indicate. The inter-correlations 
(120 in number) have been computed for the following traits: 
general appearance, health, voice, intellectual capacity, accuracy, 
self-control, sense of justice, academic preparation, interest in 
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the life of community, ability to meet and interest patrons, pro- 
fessional interest and growth, use of English, discipline (govern- 
ing skill), attention to individual needs, and general development 
of pupils. 



The most obvious fact about these correlations is monotonous 
similarity. They do not vary with the relevance of the relation- 
ships. Most of them are too high. This fact illustrates again 
the factor of spread of general estimate, which can be shown in 
still a different fashion. 

The distribution of the correlations follows: 

Frequency Correlation Range 

4 +.lto+.2 

2 +.2 to +.3 

15 +.3 to +.4 

38 +.4 to +.5 

80 +.6 to +.6 

19 +.eto+.7 

8 +.7 to +.8 

3 +.8 to +.8 

True average +.6 
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1. Suppose there were present in these correlations 100 per cent 
spread of general estimate, and that the correlation which was 
typical of the halo effect was +.5, we should not expect all the 
correlations to be exactly +.5. They would vary or be grouped 
around +.5 as a mean in a normal probability curve. That is, 
they would tend to be at +.5, but some would be a little above 
and some a little below. The probable error of the +.5 correla- 
tion, the number of individuals being 126, is =»=.068 by the for- 

mula S.D.= — 7=^. Using ±.068 as the S. D. of the probability 

curve, we can plot the position of the 120 correlations as they 
would occur, if pure chance only were operating. 

If we place the distribution of the correlations as we have them 
upon a distribution as they would occur by pure chance, then we 
can see very clearly what part of the total number of correlations 
need other explanation than that of pure chance variation from a 
typical one. The following chart shows this comparison: 




COMPARISON OF NORMAL CURVE AND DISTRIBUTION OF CORRELATIONS. THE 
SOLID LINE IS THE NORMAL CURVE, THE BROKEN LINE THE DISTRIBUTION 
OF CORRELATIONS. CENTRAL TENDENCY IN BOTH CASES +*5. 



2. Of the 120 correlations, 15 lie beyond the limits of pure- 
chance variations from a mean and 105 lie within the limits of 
chance variations. Within these limits the position of the correla- 
tions are not far dissimilar from what they would be, if chance 
only were operating. For 105 of the correlations no other facts 
than 100 per cent spread of general estimate and chance variation 
are needed. 
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The reader must deterxnine for himself how significant the other 
16 are. The 15 correlations which are not explainable by 
chance follow: 

Voice and Moral Influence + . 172 

Intellectual Capacity and Moral Influence +.173 

Intellectual Capacity and Academic Preparation + . 126 

Academic Preparation and Ability to Meet and Interest Patrons + . 158 

Accuracy and Moral Influence +.218 

General Appearance and Health +.727 

Accuracy and Attention to Individual Needs + .725 

Self-Control and Sense of Justice +.872 

Sense of Justice and Ability to Meet and Interest Patrons + . 766 

Sense of Justice and Use of Finglish +.771 

Sense of Justice and Attention to Individual Needs +.822 

Professional Interest and Growth and Use of Finglish + .748 

Use of English and General Development of Pupils + . 720 

Discipline and General Development of Pupils +.787 

Attention to Individual Needs and General Development of Pupils . . + . 807 

It wiU be seen that 5 of these correlations are too low and 10 are 
too high for explanation on a basis of mathematical chance. 
Take the correlation of voice with moral influence of +.172. 
Why should voice correlate with moral influence so loosely and cor- 
relate +.682 with intellect, .628 with accuracy, .454 with academic 
preparation, and .500 with interest in the life of the community. 
Why should voice correlate with accuracy as highly as it does with 
skill in discipline? In fact, it is not clear why the 5 especially low 
correlations are the ones which they happen to be, instead of 
being a part of any other 50 that one might pick almost at random. 
The correlations that are so high that they are beyond the range of 
chance variation from the mean are also a little difficult to ex- 
plain from any necessary relationships. Three of them have 
general development of the pupils as one factor and use of Eng- 
lish, discipline, attention to individual needs, as the others. 
Very likely these show true relationships, but the same data show 
equally as high mutual relationships between sense of justice and 
use of English; sense of justice and abiUty to meet people; self- 
control and sense of justice. Of the 120 correlations 105, or 87 
per cent, could be explained by mere-chance variation, if the state- 
ment "there is as much correlation between any two traits as be- 
tween any other two" were literally true. The remaining 15 
coefficients of correlation perhaps can best be accounted for by a 
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mixture of true insight and more or less of the usual amount of 
tendency to spread one's general estimate over particular judg- 
ments. 

DATA ON BOYCe's SCORE CARD 

The score card upon which the ratings used in the last section 
were made is the work of A. C. Boyce. His study is reported in 
the Fourteenth Year-Book of the Natumal Society for the Study of 
Education, Part IV. It is reported in the introduction of this 
study. This study is perhaps the most extended and the best one 
on the rating of teachers. Boyce's original data, as reported, con- 
tain some very good evidences of the factor of spread of general 
estimate. As this point is not stressed in the Boyce report, it 
might be well to close this discussion with a consideration of that 
report. 

Boyce found the following correlations between general teach- 
ing ability and forty-five traits: 

General Teaching Ability with r Rank 

1. General Appearance + .47 43 

2. Health +.66 39 

3. Voice +.63 42 

4. Intellectual Capacity +.62 34 

6. Initiative and Self-reliance +.77 13 

6. Adaptability and Resourcefulness + .80 11 

7. Accuracy +.74 17 

8. Industry +.69 24 

9. Enthusiasm and Optimism +.71 22 

10. Integrity and Sincerity +.63 33 

11. Self-control +.66 30 

12. Promptness +.66 29 

13. Tact i +.69 26 

14. Sense of Justice +.61 36 

16. Academic Preparation + .41 44 

16. Prof essional Preparation +.38 46 

17. Grasp of Subject-matter +.72 19 

18. Understanding of Children +.76 16 

19. Interest in the Life of the School + .66 31 

20. Interest in the Life of the Community. .... + .62 36 

21. Ability to Meet and Interest Patrons + .61 38 

22. Interest in lives of Pupils +.69 26 

23. Codperation and Loyalty +.66 28 

24. Professional Interest and Growth +.72 18 

26. Daily Pr^[)aration +.68 27 

26. UseofEngUsh +.66 40 

27. Care of Light, Heat, and Ventilation +.61 37 
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General Teaching Ability with r Bank 

28. Neatness of Room +.54 41 

20. Care of Routine +.64 32 

30. Discipline (Governing Skill) +.79 12 

31. Definiteness and Clearness of Aim + .81 10 

32. Skill in Habit Formation +.86 5 

33. Skill in Stimulating Thought +.84 8 

34. SkiU in Teaching How to Study +.84 7 

36. SkiU in Questioning +.72 20 

36. Choice of Subject-matter + .85 6 

37. Organization of Subject-matter +.87 3 

38. Skill and Care in Assignment +.82 9 

39. SkiU in Motivating Work +.74 16 

40. Attention to Individual Needs + .76 14 

41. Attention and Response of the Class + .86 4 

42. Growth of Pupils in Subject-matter + .87 2 

43. General Development of Pupils +.88 1 

44. Stimulation of Community +.70 23 

45. Morallnfluence +.71 21 

These correlations cannot be taken at their face value for two 
reasons: (a) They have not been corrected for attenuation and, 
hence, are far too low. (6) The procedure by which they were 
obtained injects an error which would make them too high. 

These correlations are based upon data collected from 39 
schools. That is, in 39 schools some judge rated the teachers for 
the traits which have been mentioned. All the ratings were then 
put on one correlation table and the mutual relationships were 
worked out. Had the correlation for each of the 39 original rat- 
ings been worked out and the mean and variability of the distri- 
bution of these correlations been given, we should know what we 
had. 

When, however, ratings from 39 sets of teachers and from as 
many different judges are combined before they are correlated, 
we do not know just what we have. At best, the resulting corre- 
lations form a composite of the correlations between the respective 
pairs of traits, plus an erroneous pooling of sets of data. The re- 
sult is by no means a simple correlation between the traits, such 
as Boyce took for granted. Boyce gave as statistical reference 
Thomdike's Mental and Social Measurement, page 172 et aeq. 
Neither Thomdike nor any other statistician would justify statis- 
tical liberties of the kind that Boyce took. It is also impossible 
to compute the reUability of his coefficients of correlation. 
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Assuming, moreover, that these two errors — ^lack of correction 
and improper treatment of data — check each other and, by good 
fortune, the correlations, as presented, are true, what are the 
probabilities of the correlations harboring a vicious spread of 
general estimate? 

The only reference Boyce made to the factor of spread or 
absence of analysis is on page 42, when he discussed the value of 
judging for separate traits. " The topics must not be too few," he 
said, "for either they wiU be so general that little analysis is made, 
or, if not general, they will be sure to leave out important points." 

In discussing the significance of the correlations between general 
teaching ability and specific traits, Boyce does not seem to have 
been much impressed with the factor of spread. Before assuming 
its absence, however, he should have computed the correlations 
between the respective pairs of traits. In other data we find that 
the correlations range the same as do the correlations between 
specific traits and general teaching ability. This has not been 
done by Boyce, nor can it be done from any of his reported data. 
The best evidence for the presence or absence of spread of general 
estimate is, therefore, not available. 

The distribution is as follows: 

Frequency Correlation Ea,nge 

+ .300 -{-.350 

1 +.350 +.400 

2 +.400 +.450 

3 +.450 +.500 

4 +.500 +.550 

6 +.550 +.600 

7 +.600 +.650 

8 +.650 +.700 

8 +.700 +.750 

4 +.750 +.800 

5 +.800 +.850 

6 +.850 +.900 

The average correlation between general teaching ability and 
specific traits is +.70. This is, of course, far too high. Un- 
fortunately, we do not know what variations from +.70 pure 
chance would explain. Within a variation of =t . 10, 60 per cent 
of the correlations fall. Further, within a variation of =t . 15, 85 
per cent of the correlations fall. If teaching is a complex process 
and if the traits recorded are distinct and specific, even to a small 



The Relation Between Specific Traits 67 

degree, the fact that 85 per cent of the correlations are found 
within a range of =t .15 is exceedingly suggestive of the presence 
of this spread of general estimate. 

A cursory examination of the individual correlations suggests 
the same thing. There is only .03 difference between the correla- 
tion of academic preparation and the correlation of professional 
preparation with general teaching ability! Within a range of .04 
come the correlations of industry, enthusiasm, tact, grasp of sub- 
ject-matter, interest in lives of the pupils, professional growth, 
daily preparation, skill in questioning, moral influence, stimula- 
tion of community with general teaching ability. The least im- 
portant trait of all is professional preparation! Within a =^ .04 
the following are of the same significance, care of routine, intel- 
lectual capacity, integrity and sincerity, self-control, promptness, 
sense of justice, interest in the life of the school, interest in the life 
of community, interest ability to meet and interest patrons, 
cooperation and loyalty, daily preparation, care of light, heat, 
ventilation. This decided monotony of the size of the correla- 
tions, which are obviously too high, is patent witness of the pres- 
ence of spread of general estimate. 

In our consideration of the correlations between general teach- 
ing ability, intellectual strength, and skill in discipUne for Towns 
A, B, C, the fact that analysis of general worth into specific traits 
was not as complete as one would have ordinarily supposed, is 
statistically demonstrated. When the ratings of 120 teachers in 
a New York school system for 45 separate traits were examined, 
the evidence again showed that analyzed judgments are far from 
being beyond question. 

In Boyce's study, as reported, while complete statistical treat- 
ment is out of the question, the correlations, as given, do not show 
the range that conamon sense would lead us to expect. Their 
monotonous similarity also suggests that, when analyzed judg- 
ments are attempted, the influence of general estimate is so strong 
that the resulting analyses are perhaps even more justifications 
of the general estimates than they are judgments of the specific 
trait. 

The purpose of this chapter has been to present data which 
show that general estimate permeates judgments of specific 
traits to a degree which has not hitherto been sufficiently empha- 
sized. 
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